Assessment of Pansharpening Methods Applied to WorldView-2 Imagery Fusion

Since WorldView-2 (WV-2) images are widely used in various fields, there is a high demand for the use of high-quality pansharpened WV-2 images for different application purposes. With respect to the novelty of the WV-2 multispectral (MS) and panchromatic (PAN) bands, the performances of eight state-of-art pan-sharpening methods for WV-2 imagery including six datasets from three WV-2 scenes were assessed in this study using both quality indices and information indices, along with visual inspection. The normalized difference vegetation index, normalized difference water index, and morphological building index, which are widely used in applications related to land cover classification, the extraction of vegetation areas, buildings, and water bodies, were employed in this work to evaluate the performance of different pansharpening methods in terms of information presentation ability. The experimental results show that the Haze- and Ratio-based, adaptive Gram-Schmidt, Generalized Laplacian pyramids (GLP) methods using enhanced spectral distortion minimal model and enhanced context-based decision model methods are good choices for producing fused WV-2 images used for image interpretation and the extraction of urban buildings. The two GLP-based methods are better choices than the other methods, if the fused images will be used for applications related to vegetation and water-bodies.


Introduction
The WorldView-2 (WV-2) satellite, launched in October 2009, offers eight multispectral (MS) bands of 1.84-m spatial resolution and a panchromatic (PAN) band of 0.46 m spatial resolution [1]. The MS bands cover the spectrum from 400 nm to 1050 nm, and include four conventional visible and near-infrared MS bands: blue (B, 450-510 nm), green (G, 510-580 nm), red (R, 630-690 nm), and near-IR1 (NIR1, 770-895 nm); and four new bands: coastal (C, 400-450 nm), yellow (Y, 585-625 nm), red edge (RE, 705-745 nm), and near-IR2 (NIR2, 860-1040 nm). The PAN band has a spectral response range of 450-800 nm, which covers shorter NIR spectral range than some common PAN bands of 450-900 nm. The WV-2 images have been widely used in various fields, e.g., geological structure interpretation [1], Antarctic land cover mapping [2], bamboo patch mapping [3], high density biomass estimation for wetland vegetation [4], mapping natural vegetation on a coastal site [5], predicting forest structural parameters [6], and especially for the detection of urban objects. Since numerous applications need high-spatial-resolution (HSR) MS images, it is highly desirable to fuse the eight MS bands and the PAN band to produce HSR MS imagery for better monitoring the Earth's surface.
Numerous pansharpening methods have been proposed in the last decades to produce spatially enhanced MS images by fusing the MS and PAN images. These methods are divided into two categories: the component substitution (CS) family and multi-resolution analysis (MRA) family. The CS approaches focus on the substitution of a component that is obtained by a spectral transformation of the to assess the quality of the fused images. Four comprehensive indices, including Dimensionless Global Relative Error of Synthesis (ERGAS) [44], Spectral Angle Mapper (SAM) [45], Q2 n [46,47], and spatial correlation coefficient (SCC) [48] were employed to measure the spectral distortion between the fused and the original MS bands. Regarding the application purpose of the high-resolution fused images, which includes land cover classification of urban or suburban areas, bamboo and forest mapping, and so on, some widely used indexes, derived from the fusion products, were assessed to evaluate the information presentation ability of the fusion products. The employed indexes include morphological building index (MBI) [49], normalized difference vegetation index (NDVI), and normalized difference water index (NDWI). The information presentation of a fusion product was assessed using the correlation coefficient (CC) between an index derived from the fusion product and the same index derived from the corresponding original MS image. A higher CC value implies a better information preservation ability of the fusion product.
This paper is organized as follows: the eight selected pansharpening methods are introduced in Section 2, as well as the quality indexes; the experimental results with visual and quantitative comparisons with other outstanding fusion methods are presented in Section 3. Discussions are presented in Section 4, whereas the conclusions are summarized in Section 5.

Algorithms
The algorithms used for the comparisons are introduced in the following subsections. MS and P represent the original low-resolution MS image and high-resolution PAN image, respectively. MS and MS represent the up-sampled MS and the fused MS images, respectively. A general formulation of CS fusion is given by: in which the subscript i indicates the ith spectral band; g i is the injection gains of the ith band, while the intensity image I L is defined by Equation (2): where w i is the weight of the ith MS band, and N is the number of MS bands. Similarity, a general formulation for MRA-based methods can be given by Equation (3): where P L is the low-frequency component of the PAN band. P L can be obtained by different approaches, such as low-pass filter, Laplacian pyramid and wavelet decomposition.

GS and GSA
GS is a representative method of the CS family, the fusion process of which is described by Equation (1), with the injection gains g i given by Equation (4): where cov (X, Y) is the covariance between the two images X and Y, and var (X) is the variance of X. Several versions of GS can be achieved by changing the method for generating I L . One way to obtain I L is simply averaging the MS components (i.e., using Equation (2) with setting w i = 1 for all i = 1, . . . , N). This modality is referred as GS or GS mode 1. Another way is using the low-pass version of P as I L . This modality is referred as GS mode 2. An enhanced version, called adaptive GS (GSA), is proposed by assigning I L as a weighted average of the original MS bands, as Equation (2). The weights w i in Equation (2) are calculated with the minimum mean square error (MSE) solution of Equation (5): where P 0 is the degraded version of P, with the same pixel sizes of the original MS bands. P 0 is generated by low-pass filtering of P, followed by decimation. Both the GS and GSA methods are included in the experiments in this study.

HR
The HR method is based on the assumption that the ratio of a HSR MS band to a low-spatial-resolution (LSR) MS band is equal to the ratio of a HSR PAN image to an assumed LSR PAN image a fusion method that considers haze [50][51][52]. The fused ith MS band MS i can be calculated by using Equation (6) according to HR fusion method: where P L is a low-pass version of P; H i and H p denote the haze values in the ith MS band and the PAN band, respectively. The values of H i and H p can be determined using the minimum grey level values in MS i and P according to an image-based dark-object subtraction method [50][51][52].

HCS
HCS is a pansharpening method designed for WV-2 imagery based on transforming the MS bands into hyperspherical color space. The HCS method offers two modes, including the native mode and the smart mode. The process of the native mode is described as follows: (a) The squares of the multispectral intensity (I 2 ) and the PAN (P 2 ) are calculated using Equations (7) and (8), respectively: (b) Calculate the mean (u P ) and standard deviation (σ p ) of P 2 , as well as the mean (u I ) and standard deviation (σ P ) of I 2 . (c) The P 2 is adjusted to the mean and standard deviation of I 2 , using Equation (9): (d) The square root of the adjusted P 2 is assigned to I adj (i.e., I adj = √ P 2 ), I adj is used in the reverse transform from HCS color space back to the original color space, using Equation (10): in which ϕ i is defined using Equation (11): For the smart case, similarity to P 2 , a PS 2 is calculated by using the low-pass version of the PAN image (P L ), i.e., PS 2 = (P L ) 2 . The P L is generated by an average filtering with size of 7 × 7. Both the means and standard deviations of PS 2 and P 2 are adjusted to those of I 2 . Then, the adjusted intensity I adj is assigned by using Equation (12): Finally, the fused image can be generated by using Equations (10) and (11). The HCS method using the smart mode is considered in the experiment in this study.

ATWT
The ATWT method is a MRA fusion approach that extracting spatial details using "à trous" wavelet transform. The fusion scheme utilizing the additive injection model can be formulated as: where P L is the low frequency component of P and is generated by the "à trous" wavelet. The "à trous" wavelet is a kind of non-orthogonal wavelet that is different from orthogonal and biorthogonal. It is a redundant transform, since decimation is not implemented during the process of wavelet transform while the orthogonal or biorthogonal wavelet can be carried out using either decimation or undecimation mode.
The whole process of the ATWT fusion can be divided into two steps [40]: (1) Use the à trous wavelet transform to decompose the PAN image to n wavelet planes. Usually, n = 2 or 3. (2) Add the wavelet planes (i.e., spatial details) of the decomposed PAN images to each of the spectral bands of the MS image to produce fused MS bands.

GLP
The fusion process of GLP can also be formulated as Equation (3). For the GLP method, the low-frequency component of the PAN band P L is generated by up-sampling the down-sampled version of the original PAN band P. The down-sampling of P is implemented using a low-pass reduction filter that matches the MTF of the band, whereas the up-sampling of P is carried out by using an ideal expansion low-pass filter.
For the GLP fusion, different detail injection models are designed to obtain the injection coefficients g i . The most used models are the spectral distortion minimizing (SDM) model and the context-based decision (CBD) model. For the SDM model, the injection coefficients g i can be obtained using Equation (14): where β(m, n) is equal to 1 for all pixels in the original SDM model [42], whereas β(m, n) is defined as the ratio between average local standard deviations of resampled MS bands and local standard deviation of P L , for the enhanced SDM (ESDM) model proposed by Aiazzi [43]: In the CBD model, the space-varying coefficients g i is defined as Equation (16): in which ρ i (m, n) is the local correlation coefficient between MS i and P L calculated on a square sliding window of size L × L centered on pixel (m, n). The CBD model is uniquely defined by the set of thresholds 0 < θ i ≤ 1, for i = 1, . . . , N, generally different for each band, and by the window size L depending on the spatial resolutions and scale ratio of the images be merged, as well as the landscape characteristics (typically, 7 ≤ L ≤ 11 to avoid loss of local sensitivity with L > 11 and statistical instability with L < 7). The thresholds may be related to the spectral content of the Pan image, e.g., θ i = 1 − ρ i , where ρ i is the global correlation coefficient between the kth band and the Pan image spatially degraded to the same resolution. A clipping constant c was introduced to avoid numerical instabilities (empirically, 2 ≤ c ≤ 3).
For the Enhanced CBD (ECBD) model proposed by Aiazzi [43], the coefficients g i is calculated as Equation (17): where ρ i is the global correlation coefficient between MS i and P L . Both the GLP methods using the ESDM and the ECBD models are considered in the experiment.

NSCT
The contourlet transform (CT) is proved to be a better approach than the wavelet for pan-sharpening. CT is implemented by a multiscale decomposition using the Laplacian pyramid followed by a local directional transform using the directional filter bank (DFB).
The NSCT is a shift-invariant version of CT and has excellent multilevel and multi-direction properties. NSCT is built upon the non-subsampled pyramid filter banks (NSPFBs) and the non-subsampled directional filter banks (NSDFBs). The NSPFB employed by NSCT is a 2-D two-channel non-subsampled filter bank, whereas the NSDFB employed by NSCT is a shift-invariant version of the critically sampled DFB in CT. The details of the NSCT can be attended in [28,53]. An improved version of the standard NSCT-based method is introduced in [28]. The process of the improved NSCT method with mode 2 (NSCT_M2) in [28] is descripted as follows.  (e) The fused fine level 1, C F i 1 , is obtain by fusing the coefficients of the same level obtained from both the ith MS band and the PAN band. For each pixel (x, y), the coefficients of the fused fine level 1, C F i 1 (x, y), is determined according to Equation (18): where LE MS i (x, y) and LE PAN (x, y) are the local energy of pixel (x, y) for the ith MS band and the PAN band, respectively, calculated within a (2M + 1) × (2P + 1) window using the formula shown in Equation (19): The inverse NSCT is applied to the fused coefficients to provide the fused ith MS band.
This improved version was demonstrated to provide pansharpened images with a good spectral quality.

Quality Indexes
Quality assessment of fusion products can be performed using two approaches. The first approach considers fusing images at a spatial resolution lower than the original resolution and uses the original MS image as a reference to assess the quality of the fused images. Several indexes have been proposed for evaluating the spatial and spectral distortions of the fused image with respect to an available reference image. The widely used spectral quality indexes include Root Mean Square Error (RMSE), Relative Average Spectral Error (RASE) [54], ERGAS, SAM, Universal Image Quality Index (UIQI) [55], Q4 [46], Q2n, and Peak Signal-to-Noise Ratio (PSNR) [56], whereas the widely used spatial indexes include SCC and Structural SIMilarity (SSIM) [57]. The second approach uses quality indexes that do not require a reference image but operate on relationships among the original images and the fusion products. This approach has the advantage of validating the products at the original scale, thus avoiding any hypothesis on the behavior at different scales. However, appropriate indexes requiring no reference should be exploited to assess the quality of the fusion product. The Quality with no reference index (QNR) was one of the mostly used indexes [58]. It is composed by the product of two separate values that quantify spectral and spatial distortions, respectively. However, QNR is proved to be lower reliability than the indexes belonging to the first approach [30], since it can be affected by slightly mismatches among the original image bands. The acquisition modality of WV-2 can led to s mall temporal misalignments among the MS bands, since the eight MS bands of are arranged in two arrays of four bands each. Consequently, the QNR index is not used in this study. Four quality indexes belong to the first approach, including ERGAS, SAM, Q2n, and SCC, are employed to assess the quality of fused images at data level. These indexes are chosen due to they are widely used in literatures related to fusion of remote sensing imagery.

ERGAS
The global index ERGAS is an improved version of the index named Relative Average Spectral Error (RASE), which is defined based on Root Mean Square Error (RMSE). The formula of ERGAS [44] is defined as follows: where u(MS k ) is the mean of the kth band of the reference image; R is the spatial resolution ratio between the MS and PAN bands; RMSE is defined as Equation (21): The optimal value for ERGAS is 0, since it is defined as a weighted sum of RMSE values.

SAM
SAM expresses the spectral similarity between a fused image and a reference image with the average spectral angle of all pixels involved [45]. Let two spectral vectors V = {V 1 , V 2 , . . . , V N } and V = V 1 , V 2 , . . . , V N present the reference spectral pixel and the fused spectral pixel, respectively, their spectral angle SAM is defined as in Equation (22): where X, Y stands for the inner-product of the two vectors X and Y, and |X| stands for the modulus of a vector X. The smaller the spectral angle, the higher the similarity between the two vectors. Since the angle is independent of the magnitudes of the two vectors, the index SAM is not affected by solar illumination factors.

Q2 n
Q2 n is a generalization of Q index for monoband images [47] and an extension of Q4 [46]. Q2 n is derived from the theory of hypercomplex numbers, particularly of 2 n -ones [59,60]. For a 2 n -on hypercomplex random variable z (in boldface) is written in the following form: where z 0 , z 1 , . . . , z 2 n −1 are real numbers, and i 0 , i 1 , . . . , i 2 n −1 are hypercomplex unit vectors, the conjugate z * is given by: and the modulus |z| is defined by Equation (25): Give two 2 n -on hypercomplex random variables z and v, the hypercomplex covariance between z and v is defined as Equation (26): The hypercomplex CC between the two 2 n -on random variables are defined as the normalized covariance: in which σ z and σ v are the square roots of the variances of z and v, and are obtained by Equations (28) and (29), respectively: The index Q2 n can be computed from Equation (30): where M is the size of the local window used to calculate Q2 n . Finally, Q2 n is obtained by averaging the magnitudes of all Q2 n M×M over the whole image, according to Equation (31): According to [46], the value of M is suggested to be 32.

SCC
To assess the spatial quality of the fusion products, the spatial details presented in the fused images will be compared with those presented in the reference image by calculating the correlation coefficient between the spatial details extracted from the two images. Similar to the procedure proposed by Otazu et al. [48], the spatial information presented in the two images to be compared is extracted by using a Laplacian filter, then, the correlations between the two filtered images are calculated band by band. However, an overall correlation coefficient of the two edge images with eight bands is calculated in this study. A high SCC value indicates that many of the spatial details of the reference image are presented in the fused image.

Information Indexes
In order to assess the ability of information extraction of the fusion products employed in specific remote sensing applications, we assessed the quality of fused images by the use of a series of information indices, which are proved to be useful in land cover classification and information extraction in previous studies. Three indices are employed in this study: MBI, NDVI, and NDWI. The accuracy of an information index derived from a fusion product is assessed using the CC between the information index and the same information index derived from the corresponding reference MS image. A higher CC value implies a better information preservation ability of the fusion product in terms of the information index. Henceforth, the CC values calculated for MBI, NDVI, and NDWI are denoted as C MBI , C NDVI , and C NDWI , respectively. The employed three information indices are introduced in this subsection.

NDVI
Based on the principle that vegetation has a strong reflectance in the near-infrared (NIR) channel but a strong absorption in the red (Red) channel, the NDVI is defined as Equation (32):

NDWI
The NDWI is defined using the spectral value of the Green band (G) and the NIR band as Equation (33):

MBI
The MBI is proposed by Huang et al. [49], aiming to represent spectral and spatial characteristics of buildings using a series of morphological operators. The MBI is calculated according to the following steps: (a) A brightness image b is generated by setting the value of each pixel p to be the maximum digital number of the visible bands. Only the visible channels are considered due to they have the most significant contributions to the spectral property of buildings. (b) The directional white top-hat (WTH) reconstruction is employed to highlight bright structures that have a size equal to or smaller than the size of the structure element (SE), and meanwhile suppresses other dark structures in the image. WTH with linear SE is defined as Equation (34): in which γ SE (s, θ) is an opening by reconstruction operator using a linear SE with a size of s and a direction of θ. (c) The difference morphological profiles (DMP) of white top-hat transforms are employed to model building structures in a multi-scale manner: (d) Finally, MBI is defined based on DMP using Equation (36): where N D and N S are the directionality and the scale of the DMPs, respectively. The definition of the MBI is based on the fact that building structures have high local contrast and, hence, have larger feature values in most of the directions of the profiles. Accordingly, buildings will have large MBI values.

Datasets
Six datasets clipped from three WV-2 scenes were considered in this study. One scene covers Beijing City, China, whereas the other two scenes cover Pingdingshan City, Hebei Province, China. The Bejing scene was acquired on 21 September 2013, with an off-nadir angle of 13.7 • . One of the Hebei scenes was acquired on 29 April 2014, with an average off-nadir angle of 12.5 • , whereas the other was obtained on 21 August 2014, with an average off-nadir angle of 8.6 • . The locations of the three scenes are shown in Figure 1. The red rectangles in the figure indicate the locations of the six datasets. Two datasets covering urban areas were selected from the Beijing scene, whereas the other four datasets were obtained from the other two WV-2 scenes. Two of them cover suburb areas, whereas the other two cover rural areas. For all the datasets, the radiometric resolution is 16 bits, whereas the spatial resolution ratio R is 4. Each dataset has a size of 256 × 256 pixels at MS scale. The two urban images are henceforth referred as I1 and I2, respectively, whereas the two suburban images are referred as I3 and I4, respectively. The two rural images are referred as I5 and I6, respectively. The typical image objects shown in the six images are listed in Table 1; the six images are shown in Figure 2.
whereas the spatial resolution ratio R is 4. Each dataset has a size of 256 × 256 pixels at MS scale. The two urban images are henceforth referred as I1 and I2, respectively, whereas the two suburban images are referred as I3 and I4, respectively. The two rural images are referred as I5 and I6, respectively. The typical image objects shown in the six images are listed in Table 1; the six images are shown in Figure 2. To evaluate the quality of the fusion products with respect to the original MS images, the degraded datasets were produced by reducing the original MS and PAN images to a spatial resolution of 8 m and 2 m, respectively. All the fusion experiments were performed on the degraded datasets. The fused 2-m images were compared with the 2-m true MS images to assess their quality. The degraded images were generated using averaging in this study [61]. The spatial resolution ratio R in this study is 4 for all the tested images.
According to the image objects included in the six test images, different information indices were chosen for each of the images ( Table 1). The fusion products of the two urban images were assessed with respect to CMBI and CNDVI, due to no water bodies can be observed from these two images. The fusion products of the two suburban images were assessed using all the three indices, whereas the fusion products of the two rural images were assessed using only CNDVI and CNDWI. To evaluate the quality of the fusion products with respect to the original MS images, the degraded datasets were produced by reducing the original MS and PAN images to a spatial resolution of 8 m and 2 m, respectively. All the fusion experiments were performed on the degraded datasets. The fused 2-m images were compared with the 2-m true MS images to assess their quality. The degraded images were generated using averaging in this study [61]. The spatial resolution ratio R in this study is 4 for all the tested images.
According to the image objects included in the six test images, different information indices were chosen for each of the images ( Table 1). The fusion products of the two urban images were assessed with respect to C MBI and C NDVI , due to no water bodies can be observed from these two images. The fusion products of the two suburban images were assessed using all the three indices, whereas the fusion products of the two rural images were assessed using only C NDVI and C NDWI .

Fusing Using the Selected Algorithms
All the fusion algorithms were implemented in MATLAB version 2014b. For all the selected eight fusion methods, the up-sampled MS images were produced using the bi-cubical interpolation approach. For the HR method, the low-spatial-resolution version of PAN image (PL) was obtained by averaging the PAN pixels in an R × R window, followed by the bi-cubically up-sampled to the resolution of the original PAN image. The GLP methods using the ESDM model and the ECBD model were used in the fusion experiment; the two methods are referred as GLP_ESDM and GLP_ECBD, respectively, in this study. For the HR method, the haze values for the MS and PAN bands were determined using the values of the pixel that offering the lowest value in the PAN image.

Assessment for the Two Urban Images
The image quality indices of the fusion products of the two urban images are shown in Table 2. The fusion products of these two images are partly shown in Figures 3 and 4, respectively, in order to facilitate the observation of the details of the fused images. The images in each figure in this work are stretched by using an identical histogram obtained from the corresponding reference MS images.

Fusing Using the Selected Algorithms
All the fusion algorithms were implemented in MATLAB version 2014b. For all the selected eight fusion methods, the up-sampled MS images were produced using the bi-cubical interpolation approach. For the HR method, the low-spatial-resolution version of PAN image (P L ) was obtained by averaging the PAN pixels in an R × R window, followed by the bi-cubically up-sampled to the resolution of the original PAN image. The GLP methods using the ESDM model and the ECBD model were used in the fusion experiment; the two methods are referred as GLP_ESDM and GLP_ECBD, respectively, in this study. For the HR method, the haze values for the MS and PAN bands were determined using the values of the pixel that offering the lowest value in the PAN image.

Assessment for the Two Urban Images
The image quality indices of the fusion products of the two urban images are shown in Table 2. The fusion products of these two images are partly shown in Figures 3 and 4, respectively, in order to  The HR method offers the highest Q8 and SCC values and the lowest ERGAS and SAM values for the two urban images, indicating that the HR method gives the best performances in both spectral and spatial quality indices. The NSCT_M2 methods yields the lowest Q8 and SCC values, the highest ERGAS and SAM values, indicating the poorest performance in spectral preservation. The GSA method provides Q8 values slight lower than the highest values provided by the HR method, followed by the GS, GLP_ESDM, GLP_ECBD, ATWT, and HCS methods. The GLP_ESDM performs the best among the MRA methods in terms of both spectral and spatial quality indices. The HCS method yields the poorest performance among the CS methods, in terms of all the four quality indices for the two urban images.
Visual comparisons for the fusion products of the two urban images (Figures 3 and 4) show that the fusion products generated by the GS, HCS, and NSCT_M2 methods show significant spectral distortions in shadow covered regions; this is especially obvious for the fusion products of I2. In addition, the fused image generated by the GLP_ECBD method for I2 also show significant spectral distortions in shadow covered regions, which is consistent with the poor performance of the GLP_ECBD method in terms of quality indexes. Although the fused images generated by the ATWT and GLP_ECBD methods show more sharpened boundaries between different objects than the other fusion products, they seem to be over sharpened and show spectral distortions. This is very obvious for the fusion products of I2. The two HCS-fused images are more blurred than other fusion products. It also can be observed that the fusion products generated by GSA, HR, and GLP_ESDM methods yield lower spectral distortions and more sharpened boundaries between different objects than other fusion products. In addition, the HR-fused images provide more texture details in vegetation covered areas.
According to the quality indexes and visual inspection, the HR, GSA, and GLP_ESDM methods give better performances than the other methods, whereas the NSCT_M2 and HCS methods offer the poorest performances, for the two urban images.

Assessment for the Two Suburban Images
The image quality indices of the fusion products of the two suburban images are shown in Table 3. Similarly, parts of the fusion products of the two images are shown in Figures 5 and 6, respectively. The GSA method offers best performances in terms of ERGAS, SAM, and Q8, whereas the GS method provides the highest SCC values, for the two suburban images. The NSCT_M2 method gives the poorest performances in terms of ERGAS, SAM, and Q8. Among the MRA methods, the GLP_ECBD method offers the highest Q8 value, whereas the ATWT method provides the highest SCC values. Although the HR method provides Q8 values slightly lower than the highest value provided by the GSA method, the former offers the lowest SCC value among the CS methods. The HR method also provides the highest SAM values among the CS methods. Although the HR method gives the best performances for the two urban images, it offers relative poor performances for the two suburban images.
Visual comparisons of the fusion products of the two suburban images show that obvious spectral distortions can be found from the water-body and shadow covered regions in the fused images generated by GS, HCS, ATWT, and NSCT_M2 methods. In addition, the fusion products of the HCS and NSCT_M2 methods are significant more blurred than other products, due to few spatial details are injected into the fused images. The fusion products generated by HR, GLP_ESDM, and GLP_ECBD methods are more sharpened and yield lower spectral distortions than other fusion products. Although the GSA method gives the best performance in terms of Q8, the corresponding fusion products seem to be more blurred than the fusion products generated by the HR, GLP_ESDM, GLP_ECBD methods. Although the GS method offers the highest SCC values, the two GS-fused images show significant spectral distortions. The two HR-fused images also show very blurred boundaries between vegetation and non-vegetation objects, which may contribute to the relative low SCC values and high SAM values provided by this method. According to the quality indexes and visual inspection, the GLP_ECBD method gives better performances than other methods for the two suburban images.

Assessment for the Two Rural Images
The image quality indices of the fusion products of the two rural images are shown in Table 4, whereas the sub-images of fusion products of the two images are shown in Figures 7 and 8, respectively. The eight fusion methods give different performances for I5 and I6, which may due to the fact that the land cover types of the two scenes are obviously different. For I5, the HR and GLP_ESDM methods give the highest Q8 and SCC values; the GS and NSCT_M2 methods provide the lowest Q8 values; the GSA method yields the lowest SCC values. However, for I6, the HR, GLP_ECBD, and ATWT methods offer the highest Q8 values, whereas the ATWT, NSCT_M2, and GLP_ECBD methods provide the highest SCC values. The HCS and GS methods provide the lowest Q8 and SCC values, respectively, for I6. The GLP_ESDM gives a slightly better performance than the GLP_ECBD method for I5. In contrast, the former offers a poorer performance than the latter for I6.
Obvious spectral distortions can be observed from the fused images generated by the GS, GSA, and HCS methods for I5, especially for the water-body covered regions and rooftop regions. The fusion products generated by the GS, GSA, and HCS methods for I5 are also more blurred than other fusion products. Conversely, the fused image generated by GLP_ECBD for I5 seems to be too sharpened, due to over injection of the spatial details. Consequently, the HR and GLP_ESDM methods give the best performances for I5, with respect to both the quality indexes and visual inspection.
The eight fusion products for I6 show no significant differences. Among the eight fusion products for I6, the product generated by the NSCT_M2 method is the most blurred; whereas pixels corresponding to vegetation in the product generated by the GLP_ESDM method seem to be too light colored. The HR-fused image shows slightly more details than other products, especially in vegetation covered regions. With respect to the quality indexes and visual inspection, the HR, GLP_ECBD, and ATWT methods yield the best performances for I6.

Information Preservation
For each of the fusion products, a CC between an information index derived from the fused image and the same index derived from the corresponding reference MS image was calculated. The CC values for MBI (C MBI ), NDVI (C NDVI ) and NDWI (C NDWI ) for fusion products of the six test datasets are listed in Table 5. The fused products generated from the two urban (I1 and I2) and tow suburban (I3 and I4) images were assessed in terms of C MBI . All the fusion products of the two urban images offer C MBI values that are significant higher than those of the up-sampled MS images (EXPs), indicating that all the fusion products show obvious improvements in terms of C MBI . This is due to the fact that spatial details extracted from the PAN bands were injected to produce these images. Generally, the performances of the eight methods in terms of C MBI are consistent with those in terms of Q8 and SCC. The HR, GSA, and GS methods offer the highest C MBI values, whereas the NSCT_M2 and HCS methods offer the lowest C MBI values, for both the urban and suburban test images. The CS methods give slightly better performances than the MRA methods in terms of C MBI , for the four test images. This is consistent with the performances of the eight methods in terms of quality indexes, for the four test images. This is also consistent with the conclusion of previous studies, which imply that the fusion products generated by the CS methods offer good visual and geometrical impression [31].
Among the MRA methods, the GLP_ESDM method offers the best performance for the two urban images, whereas the GLP_ECBD method performs the best for the two suburban images, in terms of C MBI . This is also consistent with the performances of the two methods in terms of quality indexes. According to the assessment results with respect to the quality indexes and visual inspection, the HR, GSA, and GLP_ESDM methods outperform other methods for the two urban images, whereas the GLP_ECBD method gives the best performances for the two suburban images. Actually, although the four method give different performances vary different image scenes, they outperform the other methods for all the four images, in terms of C MBI , as well as the quality indexes and visual inspection. Consequently, the HR, GSA, GLP_ESDM, and GLP_ECBD methods may be the best choice for producing fused urban/suburban WV-2 images used for image interpretation and buildings extraction.

C NDVI and C NDWI
The fusion products of all the six images were assessed using C NDVI , whereas only the fusion products for the two suburban images and the two rural images were assessed using C NDWI . We discuss the two indexes together because both of the two indexes measure the differences between inter-band relationships of the fused image and those of the reference MS image. In addition, both of them are related to the NIR bands.
It can be observed that some of the fusion products offer C NDVI or C NDWI values that are lower than those of provided by the corresponding up-sampled MS images (EXPs), and the highest C NDVI values are just slightly higher than those of the EXPs. This indicates that the fusion products show limited improvements of in terms of C NDVI and C NDWI , which may be caused by the spectral distortions of the fused NIR bands.
The HCS method offers relative high C NDVI and C NDWI values for all the test images. This may due to the fact that the inter-band relationships of up-sampled LSR MS bands are preserved in the corresponding HCS-fused images. Since the HCS-fused images show significant spectral distortions in terms of both quality indexes and visual inspection, it is regarded as offering poor performance for specific applications. Hence, we do not discuss about the performance of this method henceforth in this section.
For the two urban images, the GLP_ESDM, HR, and GS methods offer the highest C NDVI and C NDWI values. With respect to the fact the GS-fused images show obvious spectral distortions and only the GLP_ESDM method offers higher C NDVI and C NDWI values than those of the corresponding EXPs, the GLP_ESDM method is a better choice for producing fusion products that will be used in applications related to urban vegetation and water-bodies.
For I3 and I4, the GS, ATWT, and GLP_ECBD methods provide the highest C NDVI and C NDWI values, whereas the HR and NSCT_M2 methods offer the lowest C NDVI and C NDWI values. With respect to the fact that the GS and ATWT methods give poor performances in terms of quality indexes and visual inspection, the GLP_ECBD method give the best performances for the two suburban images in terms of NDVI and NDWI information preservation, as well as quality indexes and visual inspection.
For I5, the GLP_ESDM, ATWT, and HR methods offer higher C NDVI and C NDWI values than other methods. With respect to the fact that the ATWT method gives poor performances in terms of quality indexes and visual inspection and only the GLP_ESDM method offers higher C NDVI and C NDWI values than those of the corresponding EXPs, the GLP_ESDM method is the better choice for rural WV-2 images with similar image objects with I5, in terms of NDVI and NDWI information preservation, as well as quality indexes and visual inspection.
For I6, the GLP_ECBD, ATWT, and GLP_ESDM methods provide higher C NDVI and C NDWI values than the other methods. With respect to the fact that the HR, GLP_ECBD, and ATWT methods outperform the other methods in terms of quality indexes and visual inspection only the GLP_ECBD method offers higher C NDVI and C NDWI values than those of the corresponding EXPs, the GLP_ECBD method is the best choice for rural WV-2 images with similar image objects with I6, in terms of NDVI and NDWI information preservation, as well as quality indexes and visual inspection.

Discussion
Generally, the comparisons of different pansharpening methods are performed by assessing fusion products using spectral and spatial quality indexes, as well as visual inspection. However, a good performance in terms of quality indexes and visual inspection does not always result in a good choice for different application purposes. The NDVI, NDWI, and MBI index, which are widely used in applications related land cover classification, the extraction of vegetation area, buildings, and water bodies, were employed in this study to evaluate the performances of the selected pansharpening methods in terms of the information presentation ability. In this study, the performances of eight selected state-of-art pan-sharpening methods were assessed using information indices (NDVI, NDWI and MBI), along with current image quality indices (ERGAS, SAM, Q2 n and SCC) and visual inspection, with six datasets from two WV-2 scenes.

General Performances of the Selected Pansharpening Methods
Generally, the HR, GSA, GLP_ESDM, and GLP_ECBD methods give better performances than the other methods, whereas the NSCT and HCS methods offer the poorest performances, for most of the test images, in terms of quality indexes and visual inspection. The four methods also give slightly different performances for images including different image objects. For example, the HR, GSA, GLP_ESDM methods give the best performances for the two urban images, whereas the GLP_ECBD provides the best performances for the two rural images. However, the fusion products of the four methods offer good visual quality for most images. Consequently, the HR, GSA, GLP_ESDM, and GLP_ECBD methods are good choices if the fused WV-2 images will be used for image interpretation.
The results of the assessments using the three information indices show that the rank of the selected eight fusion methods in terms of C MBI is a little similar with those in terms of Q8 and SCC. This may indicate that the assessment using only the quality indexes and visual inspection is sufficient for selecting a best fusion method for producing fused urban WV-2 images used for image interpretation and applications related to urban buildings. The order of eight methods for in terms of C NDVI is similar with that in terms of C NDWI . This is due to the fact that both C NDVI and C NDWI measure the differences between the inter-band relationships of a fused image and those of the corresponding reference MS image. In contrast, the orders of the eight methods in terms of C NDVI and C NDWI are significant different from those in terms of Q8 and SCC. This indicates that a fusion method offering the best performance for a certain image in terms of quality indexes and visual inspection does not always provide the highest C NDVI and C NDWI values. Generally, the GLP_ESDM method outperforms the other methods for I1, I2 and I5, whereas the GLP_ECBD method provides the best performances for I3, I4 and I6, in terms of C NDVI and C NDWI , as well as quality indexes and visual inspection. This indicates that the GLP_ESDM is the best choice for images with similar objects with I1, I2 and I5, whereas the GLP_ECBD is the best choice for images with similar objects with I3, I4 and I6, for producing fusion products used for applications related to vegetation or water-bodies. In addition, the fusion products show limited improvements in terms of C NDVI and C NDWI . This indicates that it is hard for the fusion products to preserve the NDVI and NDWI information obtained from the corresponding up-sampled MS images. Consequently, it is necessary to evaluate fusion products using information indices (i.e., NDVI and NDWI) if fused WV-2 images will be used for applications related to vegetation and water-bodies.

Effects for Different Spectral Ranges between the PAN and MS Bands
A noticeable point for the WV-2 is that the spectral range of the PAN band covers limited portion of the spectral ranges of the C, NIR1 and NIR2 bands. This results in relative low correlation coefficients between these bands and the PAN band. It is interesting to see the performances of the selected pansharpening methods on the two NIR bands and the C band of WV-2. In order to assess the spectral distortion of each fused band, the CC value between each fused band and the corresponding reference band was calculated for each fusion product. The CC values for the fusion products of I1, I3, I5 and I6 are shown in Table 6. The CC values of I2 and I4 are not presented because they are similar with those of I1 and I3, respectively.
It can be seen from Table 6 that the CC values of the two NIR bands are significantly lower than those of the other bands for all the fusion products, indicating that the fused NIR bands show more spectral distortions than the other bands. This is caused by the relative low correlation coefficients between the two NIR bands and the PAN band. This is also revealed by previous studies, the higher the correlation between the PAN band and each MS band, the better the success of fusion [31]. Generally, the four CS methods offer higher CC values for the two NIR bands than the four MRA methods for most of the test images. This is consistent with the result of visual inspection of these fusion products. However, the two GLP-based methods, which provide good performances in terms of NDVI and NDWI information preservation, offer relative low CC values for the fused NIR1 band, due to the low CC between the PAN and the NIR1 band. This proves again that it is necessary to evaluate fusion products using information indices (i.e., NDVI and NDWI) if fused WV-2 images will be used for applications related to vegetation and water-bodies.

How to Extend the Selected Pansharpening Methods to Other HSR Satellite Images
As introduced in the previous sections, the HR, GSA, GLP_ESDM, and GLP_ECBD methods are good choices for producing fused WV-2 images used for image interpretation and applications related to urban buildings. The two GLP-based methods outperform other methods for generating fused WV-2 images used for applications related to vegetation and water-bodies. It is interesting for the readers that whether these methods give similar performances to the sensors having a similar PAN spectral range with WV-2, such as GeoEye-1, and WorldView-3/4. Actually, the selected pansharpening methods can be categorized into two groups, according to the approaches employed to generate the synthetic PAN band P L , which mainly contains the low-frequency component of the original PAN band. For the first group, P L is generated by applying filters to the original PAN band, or by up-sampling the degraded version of the original PAN band. In contrast, for the second group, the intensity image I L , which can be seen as another approach for generating the synthetic PAN band P L , is generated using the weighted combination of the LSR MS bands. The methods belong to the first group include HR, ATWT, NSCT and the two GLP-based methods, whereas the methods belong to the second group include GS, GSA and HCS methods. For the first group, the low-frequency component of P L has relative low correlations with the C, NIR1 and NIR2 bands, but has relative high correlations with the other spectral bands. This result in the fact that the details of the PAN band have relative high correlations with the B, G, Y, R and RE bands, but relative low correlations with the C, NIR1 and NIR2 bands. This may result in the fact that a large amount of spatial details are injected into the B, G, Y, R and RE bands, but only a small amount of the spatial details are injected into the C, NIR1 and NIR2 bands, especial for the case the injection gains are determined considering the relationship between each MS band and the PAN band. For the second group, the low-frequency component of the intensity image is related or partly related to the C, NIR1 and NIR2 bands. This may result in the fact that the low-frequency component of the C, NIR1 and NIR2 bands may be injected into the B, G, Y, R and RE bands, and hence may lead to spectral distortions of these bands. An exception occurs for GSA, since the intensity image I L employed the GSA method have low CC with the C, NIR1 and NIR2 bands, due to the weights w i obtained using Equation (5) are very low for these bands.
According to the introduction about the algorithms of the selected methods, different injection gains g i are employed by these methods. The GS and GSA methods use a band-dependent model considering the relationship between each MS band and the PAN band. The GSA method outperform the GS method due to the intensity image I L employed the former have low CC values with the C, NIR1 and NIR2 bands. It can be seen from Table 6 that the CC values for the B, G, Y, R, and RE bands of the GSA-fused images are significantly higher than those of the GS-fused image. The HR method uses the SDM model, which is also band-dependent. The ATWT method employs a simple additive injection model with weights for each band equal to 1, whereas the two GLP-based methods use the ESDM and ECBD models, respectively. Among these models, only the ESDM and ECBD models consider the local dissimilarity between the MS and PAN bands. According to the experimental results, the two GLP-based methods give good performances in terms of NDVI and NDWI information preservation. This may due to the fact that only the ESDM and ECBD models consider the local dissimilarity between the MS and PAN bands. It is also demonstrated by previous studies that local dissimilarity between the MS and PAN bands should be considered by pansharpening methods to reduce spectral distortions.
As a result of the above analyses about the algorithms of the selected pansharpening methods, we can obtain the following conclusions. Firstly, for the spectral bands with relative high correlations with the PAN band, the synthetized PAN band should be obtained using the original PAN band and the injection gains should considering the relationship between each MS band and the PAN band. Secondly, for the spectral bands with relative low correlations with the PAN band, further experiments should be designed to evaluate which approach is better for generating the synthetized PAN band. However, there is no doubting that local dissimilarity between the MS and PAN bands should be considered for the fusion of these bands, i.e., the NIR band, especially for the case that the fused images will be used in applications related to vegetation and water-bodies.
According to the analysis, we can conclude that the GSA, HR, GLP_ESDM, and GLP_ECBD methods can also provide good performances for similar sensors, such as GeoEye-1, WorldView-3, WorldView-4, for the cases that the fusion products will be used in image interpretation or urban buildings. Actually, it is proved by previous studies that the performances of these newly proposed methods are sensor independent [30]. However, for the case that the fusion products will be used in applications related to vegetation or water-bodies, the GLP-ESDM and GLP_ECBD methods or other fusion methods consider local dissimilarity between the MS and PAN bands are better choices.

Conclusions
The performances of eight state-of-art pan-sharpening methods for WV-2 imagery were assessed using information indices (NDVI, NDWI, and MBI), along with current image quality indices (ERGAS, SAM, Q2 n , and SCC) and visual inspection, with six WV-2 datasets. The main findings and conclusions derived from our analyses are as follows: (1) Generally, the HR, GSA, GLP_ESDM and GLP_ECBD methods give better performances than the other methods, whereas the NSCT and HCS methods offer the poorest performances, for most of the test images, in terms of quality indexes and visual inspection. Some of the fusion products generated by the GS and ATWT methods show significant spectral distortions. In addition, the performances of the eight methods in terms of C MBI are consistent with those in terms of Q8 and SCC. Consequently, the HR, GSA, GLP_ESDM, and GLP_ECBD methods are good choices if the fused WV-2 images will be used for image interpretation and applications related to urban buildings. The four methods can also provide good performances for other WV-2 image scenes, for producing fused images used for image interpretation. (2) The order of the pansharpening methods in terms of C NDVI is consistent with that in terms of C NDWI . This is because both of the two indexes measure the differences between inter-band relationships of the fused image and those of the reference MS image, and both of them are related to the quality of the fused NIR1 bands. The GLP_ESDM method offers higher C NDVI and C NDWI values for I1, I2 and I5, whereas the GLP_ECBD method provides higher C NDVI and C NDWI values for I3, I4 and I6, as well as good performances in terms of quality indexes and visual inspection. Consequently, the GLP_ESDM and GLP_ECBD methods are better than other methods, if the fused WV-2 images will be used for applications related to vegetation and water-bodies. However, for this case, it is better to select a best method by comparing the indexes C NDVI and C NDWI , as well as quality indexes and visual inspection, since the GLP_ESDM and GLP_ECBD methods may give different performances for images with different land cover objects.
(3) According to the experimental results of this work and the analyses the algorithms of the selected pansharpening methods, we can offer two suggestions for the fusion of images obtained by sensors similar with WV-2, such as Geoeye-1 and Worldview-3/4. Firstly, for the spectral bands with relative high correlations with the PAN band, the synthetized PAN band should be obtained using the original PAN band and the injection gains should considering the relationship between each MS band and the PAN band. The HR, GSA, GLP_ESDM, and GLP_ECBD method also can offer good performances for scenes obtained by GeoEye-1 and Worldview-3/4, for producing fused images used for interpretation and applications related to urban buildings. Secondly, for the spectral bands with relative low correlations with the PAN band, local dissimilarity between the MS and PAN bands should be considered for the fusion of these bands, i.e., the NIR band, especially for the case that the fused images will be used in applications related to vegetation and water-bodies.