A Novel Pan-Sharpening Framework Based on Matting Model and Multiscale Transform

Pan-sharpening aims to sharpen a low spatial resolution multispectral (MS) image by combining the spatial detail information extracted from a panchromatic (PAN) image. An effective pan-sharpening method should produce a high spatial resolution MS image while preserving more spectral information. Unlike traditional intensity-hue-saturation (IHS)and principal component analysis (PCA)-based multiscale transform methods, a novel pan-sharpening framework based on the matting model (MM) and multiscale transform is presented in this paper. First, we use the intensity component (I) of the MS image as the alpha channel to generate the spectral foreground and background. Then, an appropriate multiscale transform is utilized to fuse the PAN image and the upsampled I component to obtain the fused high-resolution gray image. In the fusion, two preeminent fusion rules are proposed to fuse the lowand high-frequency coefficients in the transform domain. Finally, the high-resolution sharpened MS image is obtained by linearly compositing the fused gray image with the upsampled foreground and background images. The proposed framework is the first work in the pan-sharpening field. A large number of experiments were tested on various satellite datasets; the subjective visual and objective evaluation results indicate that the proposed method performs better than the IHSand PCA-based frameworks, as well as other state-of-the-art pan-sharpening methods both in terms of spatial quality and spectral maintenance.


Introduction
High-resolution multichannel satellite images with both high spatial resolution and spectral diversity are required in many image-processing applications, such as change detection, land-cover segmentation, and road extraction.However, due to the limits on the signal-to-noise ratio, these acquisitive images cannot be achieved only through a single sensor [1].Thus, most remote sensing satellites (e.g., WorldView-2, QuickBird, IKONOS) simultaneously provide a panchromatic (PAN) image with high spatial but low spectral resolution, and a multispectral (MS) image with complementary properties [2,3].Then, a pan-sharpening algorithm is applied to merge the MS and PAN images to have a high spatial resolution MS image, preserving the spectral information of MS.When the spatial details are obtained from a multispectral/hyperspectral sequence, the pan-sharpening algorithm is called hyper-sharpening [4].
In the last two decades, many pan-sharpening methods have been proposed to fuse MS and PAN images.A detailed comparison of the characteristics and performance of classical and famous pan-sharpening methods was presented in [5].Initial efforts mainly focus on the component substitution-based (CS) methods, such as the principal component analysis (PCA) method [6], the intensity-hue-saturation (IHS) method [7,8] and the Gram-Schmidt (GS) method [9].The primary concept of these CS-based methods is to first transform the upsampled MS images to a new space.If one component of the new space contains equivalent structures to those of the PAN image, fusion occurs by totally or partially substituting this component with the PAN image.Finally, a corresponding inverse transform is performed to obtain the high-resolution pan-sharpened image.These CS-based methods are fast and easy to implement, but are not adaptable to the latest generation of high-resolution remote sensing images.A common problem that arises with these methods is the serious color change due to spectral distortion.The main reason for the spectral distortion is that the wavelength range of the new satellite PAN image is extended from the visible to the near infrared [10].A possible simple modification of these schemes, also applicable to CS methods, replaces the interpolated MS image with its deblurred version, where the deblurring kernel is matched to the modulation transfer function (MTF) of the MS sensor [11].Recently, pan-sharpening has also been formulated as a compressive sensing reconstruction problem, but this scheme has high computation complexity [12,13].
Multiresolution analysis (MRA)-based methods have attracted interest in the pan-sharpening field [14,15].These include traditional pyramid-based methods such as Laplace transform (LP) [16] and gradient transform [17]; wavelet transform (WT)-based methods such as discrete wavelet transform (DWT) [18]; and dual-tree complex wavelet transform (DTCWT) [19], and burgeoning geometric analysis methods such as Curvelet transform [20], Contourlet transform [21] and Shearlet transform [22].The MRA approaches extract high-pass spatial detail information from the PAN image and then inject them into each band of the MS image, interpolating at the same resolution as the high-resolution PAN image [23].Compared to the CS methods, the MRA methods can preserve spectral characteristics in the sharpened image, but they usually suffer from spatial distortion problems, such as ringing or stair-casing phenomena [24].
To overcome the limits of MRA and CS methods, many researchers jointly adopted the CS and MRA methods to fuse the MS and PAN images [25].These category methods are based on a CS frame.They fuse the representation component of the MS image and the PAN image by the use of various multiscale transforms.In addition, some predefined fusion rules are used in the fusion process.Then, the corresponding inverse transform is applied in the fused component and the other components of the MS image to obtain the final sharpened image.For example, Cheng et al. [26] jointly adopted IHS and WT to fuse IKONOS and QuickBird satellite images, in which WT was applied to the intensity component of the MS images and PAN image.The spectral information of the resultant images was improved through this method.Dong et al. [27] combined the IHS with the Curvelet transform to fuse remote sensing images, utilizing the better edge preservation property of Curvelet to enhance the spatial quality of the fused images.Shah et al. [28] proposed an adaptive PCA and Contourlet transform-based pan-sharpening method, in which the adaptive PCA was used to reduce the spectral distortion.Ourabia et al. [29] tried to find a compromise between the spatial resolution enhancement and the spectral information preservation at the same time by using enhanced PCA and nonsubsampled Contourlet transform (NSCT).
It can be seen that the foregoing hybrid methods improve performance to some extent, compared to the previous CS-or MRA-based pan-sharpening methods, but there are also some drawbacks to these frames.For example, the IHS transform-based frameworks are only feasible for three bands images, and the PCA transform-based frameworks result in serious spectral distortion.To improve the pan-sharpening performance, a novel pan-sharpening framework that is based on the matting model and multiscale transform is proposed in this paper.First, the I component of the MS image is selected as the alpha channel to estimate the foreground and background images with a local linear assumption.Then, the upsampled I component and the PAN image are fused by adopting a discretionary multiscale transform tool.Finally, a high-resolution sharpened MS image can be achieved by a composition operation on the fused image with the upsampled foreground and background images.The proposed framework has no limits regarding the amount of MS image bands.Thus, it can be directly applied for three, four, and even eight bands of MS images.
Furthermore, in terms of the multiscale transform, due to limits on directional selectivity, the classical WT can be sensitive to point-wise singularities, but cannot capture other types of salient features, such as lines.Therefore, WT often causes artifacts and Gibbs effects in the final fused results.To better represent high-order singular features, researchers have put forward a number of more effective multiresolution geometric analysis tools, such as the aforementioned Curvelet transform and Contourlet transform.These transforms are anisotropic and have good directional selectivity, so they can accurately represent the image edge information in different scales and directions.However, because there are down-sampling operations in their decompositions, they lack shift-invariance, resulting in the fused result being affected by the noise or mis-registration of source images.To overcome this disadvantage, Cunha et al. [30] proposed the NSCT, which is an improved version of the Contourlet transform.However, the computational complexity of NSCT is high, and thus the fusion process consumes too much time.
In recent years, an excellent multiresolution analysis tool named nonsubsampled Shearlet transform (NSST) was put forward and has been applied extensively [31].The NSST is not only shift invariant, but also multiscale, and it exhibits multidirectional expansion; it maps the standard shearing filters from a false polarization grid system into a Cartesian coordinate system directly during the multidirectional decomposition procedure.This mapping process can maintain well deserved multiscale analyses properties while drastically reducing computation complexity.Thus, as an example, we choose the NSST in this paper due to its superiority compared to other multiscale transforms.Certainly, it can be extended to any other multiscale transform approach in the proposed framework if necessary.
In the multiscale transform image fusion process, the coefficient fusion rules are crucial.For the low-frequency coefficients, a gradient domain-based adaptive weighted averaging rule is proposed.Furthermore, as to the high-frequency coefficients, the simple but effective spatial frequency (SF) fusion rule is designed.Experiments on WorldView-2, QuickBird, and IKONOS satellite images demonstrate that the proposed method outperforms several state-of-the-art pan-sharpening methods in terms of both subjective and objective measures.
This paper is organized as follows.We address relevant works about the matting model and NSST theories (Section 2), introduce the proposed method in detail (Section 3), and display the evaluation metrics and fusion result discussion (Section 4).Conclusions and some future works are presented in Section 5.

Image Matting Model and Application in Pan-Sharpening
In image matting [32] theory, an input image can be accurately distinguished into a foreground image F and a background image B through a linear composite model as follows: where α, named the alpha channel, is the opacity of the foreground image F, and m refers to the mth pixel.Figure 1 shows an example image (Figure 1a), its corresponding alpha channel (Figure 1b), foreground image (Figure 1c), and background image (Figure 1d).In general, obtaining the alpha channel is the crucial process for image matting.It is usually selected according to the researcher's experience or through trial and error; the result can be very subjective.According to the local linear assumption of the matting model [32], the corresponding spectral foreground and background colors in a local window of an image will be spatially smooth if the alpha channel contains most of the edge information of this image in the window.For example, if the I component (Figure 1e) is used as the alpha channel, the foreground color (Figure 1f) and background color (Figure 1g) are rich in spectral information, but not in spatial information.While the input image and alpha channel α are decided, the spatially smooth foreground image F and background image B can be estimated by solving the following energy function: where  Certainly, the source input image can be restructured by combining α , foreground image F, and background image B, with Formula (1).Motivated by the good performance of the matting model, Kang et al. [33] proposed a simple and effective pan-sharpening method, in which the downsampled PAN image was regarded as the alpha channel to obtain the foreground and background images of the MS image.Then, the sharpened high-resolution MS image was produced, combining the original PAN image with the upsampled foreground and background images.This method belongs to the CS category, and it used the PAN image to replace the original alpha channel.The characteristics of the PAN image and MS image are not completely identical due to the different signal-to-noise ratios in the remote sensing imaging process, so the direct substitution approach may result in spectral distortion.Instead of using a PAN image, in the proposed framework, the I component is used as the alpha channel first, and then the fused image of the I component and PAN image is adopted to replace the original alpha channel.

NSST Image Decomposition
Shearlet is a particular case of the continuous wavelet with superior directional sensitivity [34].In dimension 2 n = , the continuous Shearlet transform is defined as the mapping where , 2 ( ) L ⋅ represents square integrable space, R represents a real number, and Z represents an integer.The value A denotes the anisotropy matrix for multi-scale partitions, and S Certainly, the source input image can be restructured by combining α, foreground image F, and background image B, with Formula (1).Motivated by the good performance of the matting model, Kang et al. [33] proposed a simple and effective pan-sharpening method, in which the downsampled PAN image was regarded as the alpha channel to obtain the foreground and background images of the MS image.Then, the sharpened high-resolution MS image was produced, combining the original PAN image with the upsampled foreground and background images.This method belongs to the CS category, and it used the PAN image to replace the original alpha channel.The characteristics of the PAN image and MS image are not completely identical due to the different signal-to-noise ratios in the remote sensing imaging process, so the direct substitution approach may result in spectral distortion.Instead of using a PAN image, in the proposed framework, the I component is used as the alpha channel first, and then the fused image of the I component and PAN image is adopted to replace the original alpha channel.

NSST Image Decomposition
Shearlet is a particular case of the continuous wavelet with superior directional sensitivity [34].In dimension n = 2, the continuous Shearlet transform is defined as the mapping where ψ ∈ L 2 (R 2 ), L 2 (•) represents square integrable space, R represents a real number, and Z represents an integer.The value A denotes the anisotropy matrix for multi-scale partitions, and S denotes the shear matrix for directional analysis; they are both 2 × 2 invertible matrices and |detS|= 1 .The values j, l, and k are scale, direction and shift parameters, respectively.For each a > 0 and S ∈ R, the matrices of A and S are given as follows: Assume a = 4, and s = 1; at the moment, the continuous wavelet is the Shearlet.Equation ( 4) can be modified further: NSST is composed of two phases including multi-scale decomposition and multi-directional decomposition.In the multi-scale decomposition process, the non-subsampled Laplacian pyramid transform (NSLP) is utilized.Thus, it has superior performance in terms of shift-invariance.After the jth level scale decomposition, an image can be decomposed into j + 1 sub-bands with the same size of the source image, in which one sub-band image is the low-frequency component and other m images are the high-frequency sub-band images.In the multi-directional decomposition process, the realization is via the improved Shearlet filters.These filters are formed by avoiding subsampling to satisfy the property of shift-invariance.Shearlet filters allow direction decomposition l with stages in high-frequency images from NSLP at each level and produce 2 l + 2 directional sub-images that are the same size as the source image.The NSST is a fully shift-invariant, multi-scale, and multi-directional expansion.Figure 2 shows a three-level NSST decomposition model that unfolds the NSLP and its corresponding directional decompositions.Figure 3 illustrates the one-level NSST decomposition on a WorldView-2 satellite image.Figure 3b is a low-frequency image that represents the source image approximate information.Figure 3c shows the high-frequency sub-band images, from which it is possible to observe the anisotropic and better directional selectivity of NSST.For each 0 a > and S ∈ R , the matrices of A and S are given as follows: Assume 4 a = , and 1 s = ; at the moment, the continuous wavelet is the Shearlet.Equation (4) can be modified further: NSST is composed of two phases including multi-scale decomposition and multi-directional decomposition.In the multi-scale decomposition process, the non-subsampled Laplacian pyramid transform (NSLP) is utilized.Thus, it has superior performance in terms of shift-invariance.After the j th level scale decomposition, an image can be decomposed into 1 j + sub-bands with the same size of the source image, in which one sub-band image is the low-frequency component and other m images are the high-frequency sub-band images.In the multi-directional decomposition process, the realization is via the improved Shearlet filters.These filters are formed by avoiding subsampling to satisfy the property of shift-invariance.Shearlet filters allow direction decomposition l with stages in high-frequency images from NSLP at each level and produce 2 2 l + directional sub-images that are the same size as the source image.The NSST is a fully shift-invariant, multi-scale, and multi-directional expansion.Figure 2 shows a three-level NSST decomposition model that unfolds the NSLP and its corresponding directional decompositions.Figure 3 illustrates the one-level NSST decomposition on a WorldView-2 satellite image.Figure 3b is a low-frequency image that represents the source image approximate information.Figure 3c shows the high-frequency sub-band images, from which it is possible to observe the anisotropic and better directional selectivity of NSST.

The Overall Matting Model-Based Pan-Pharpening Framework
The most frequently used CS and multiscale joint pan-sharpening methods are mostly based on the IHS or PCA, but due to various problems with these models, many researchers have put forward improved approaches.Here, the proposed matting model-based pan-sharpening framework is presented.The schematic diagram of this framework is shown in Figure 4.It mainly consists of two parts: spectral estimation based on the matting model, and image fusion based on the multiscale transform.This framework is suitable for any multiscale transform method (e.g., Wavelet, Contourlet, Shearlet).Here, we emphasize introducing the NSST that is applied in this framework.The detailed procedures of the proposed framework are as follows: (1) Calculate the intensity component I of the low-resolution resource MS image.
where n is the bands of the MS image.
(2) Estimate the foreground color F and background color B by using the Formula (2) in which the I component is used as the alpha channel.(3) Upsample the foreground color F, background color B, and I component to the same size as the PAN image by using the bicubic interpolation algorithm.(4) Fuse the upsampled I component and the PAN image through any applicable multiscale transform-based image fusion method.In this paper, the NSST is adopted as a detailed example.(5) Once the multiscale fusion result is obtained, the final high-resolution sharpened MS image can be achieved by combining the upsampled foreground color F, background B, and the multiscale fusion result (the alpha channel) together by using Formula (1).

The Overall Matting Model-Based Pan-Pharpening Framework
The most frequently used CS and multiscale joint pan-sharpening methods are mostly based on the IHS or PCA, but due to various problems with these models, many researchers have put forward improved approaches.Here, the proposed matting model-based pan-sharpening framework is presented.The schematic diagram of this framework is shown in Figure 4.It mainly consists of two parts: spectral estimation based on the matting model, and image fusion based on the multiscale transform.This framework is suitable for any multiscale transform method (e.g., Wavelet, Contourlet, Shearlet).Here, we emphasize introducing the NSST that is applied in this framework.The detailed procedures of the proposed framework are as follows: (1) Calculate the intensity component I of the low-resolution resource MS image.
where n is the bands of the MS image.(2) Estimate the foreground color F and background color B by using the Formula (2) in which the I component is used as the alpha channel.(3) Upsample the foreground color F, background color B, and I component to the same size as the PAN image by using the bicubic interpolation algorithm.(4) Fuse the upsampled I component and the PAN image through any applicable multiscale transform-based image fusion method.In this paper, the NSST is adopted as a detailed example.(5) Once the multiscale fusion result is obtained, the final high-resolution sharpened MS image can be achieved by combining the upsampled foreground color F, background B, and the multiscale fusion result (the alpha channel) together by using Formula (1).

NSST-Based Multiscale Transform Image Fusion
By use of NSST multiscale decomposition, an image can generate a low-frequency sub-band coefficient image and a series of high-frequency sub-band coefficient images.The low-frequency sub-band is the approximate version of the source image, containing the main information of the source image.In the low-frequency coefficients fusion procedure of the upsampled I and PAN image, if we only use the sub-band of upsampled I as fusion coefficients, the spectral information of the MS image can be maintained commendably, but the spatial quality will decrease.The contrary is the case when only the sub-band of the PAN image is used.In this paper, we design a gradient domain-based adaptive weighted averaging rule for low-frequency coefficients fusion, which can utilize the characteristics of the I component and PAN image.For high-frequency coefficients, which represent the edge and texture information of the source image, we propose to adopt the local spatial SF as the fusion rule to select the high-frequency coefficients.Figure 5 illustrates the flow chart of the upsampled I and PAN image fusion based on NSST, which can be summarized as follows: (1) Histogram matching: Histogram matching on the PAN image is performed by using the upsampled I component as a reference image.This step aims at creating the same mean value and variance in the PAN image as in the I component, reducing spectral distortion in the fusion process.

NSST-Based Multiscale Transform Image Fusion
By use of NSST multiscale decomposition, an image can generate a low-frequency sub-band coefficient image and a series of high-frequency sub-band coefficient images.The low-frequency sub-band is the approximate version of the source image, containing the main information of the source image.In the low-frequency coefficients fusion procedure of the upsampled I and PAN image, if we only use the sub-band of upsampled I as fusion coefficients, the spectral information of the MS image can be maintained commendably, but the spatial quality will decrease.The contrary is the case when only the sub-band of the PAN image is used.In this paper, we design a gradient domain-based adaptive weighted averaging rule for low-frequency coefficients fusion, which can utilize the characteristics of the I component and PAN image.For high-frequency coefficients, which represent the edge and texture information of the source image, we propose to adopt the local spatial SF as the fusion rule to select the high-frequency coefficients.Figure 5 illustrates the flow chart of the upsampled I and PAN image fusion based on NSST, which can be summarized as follows: (1) Histogram matching: Histogram matching on the PAN image is performed by using the upsampled I component as a reference image.This step aims at creating the same mean value and variance in the PAN image as in the I component, reducing spectral distortion in the fusion process.(2) NSST decomposition: The upsampled I and the matched PAN image are decomposed to obtain the relevant different scale and the directional sub-band coefficients L I j 0 , H I j,l and L P j 0 , H P j,l , where L I j 0 and L P j 0 are the low-frequency sub-band coefficients, H I j,l and H P j,l are the jth scale, lth directional high-frequency sub-band coefficients.
(3) Low-frequency coefficients fusion: The low frequency fused coefficients L F j 0 are obtained by using a gradient domain-based adaptive weighted averaging rule.(4) High-frequency coefficients fusion: The local SF maxima rule is adopted to obtain the high-frequency fused coefficients H F j,l .(5) Inverse NSST: The inverse NSST is applied to the fused low-and high-frequency coefficients to obtain the fused image F.
( (5) Inverse NSST: The inverse NSST is applied to the fused low-and high-frequency coefficients to obtain the fused image F.

Low-Frequency Coefficients Fusion Algorithm
The low-frequency sub-band, obtained by the NSST decomposition, contains the main energy and represents the approximation component of the source image.At present, the processing of lowfrequency sub-band coefficients usually uses the simple averaging rule, which reduces the fused image contrast and loses some useful information of the source image.The accurate choice of lowfrequency coefficients decides the quality of the fused image.In this paper, an adaptive weighted averaging in the gradient field rule is proposed for low-frequency sub-band coefficient fusion.The image gradient can be regarded as a clarity index in the space domain; the pixels with greater gradient values are considered useful information, such as edge or clearer area information, and selected as the fused pixels.Therefore, we used the gradient information of low-frequency coefficients to design the weight averaging factor ω .

Low-Frequency Coefficients Fusion Algorithm
The low-frequency sub-band, obtained by the NSST decomposition, contains the main energy and represents the approximation component of the source image.At present, the processing of low-frequency sub-band coefficients usually uses the simple averaging rule, which reduces the fused image contrast and loses some useful information of the source image.The accurate choice of low-frequency coefficients decides the quality of the fused image.In this paper, an adaptive weighted averaging in the gradient field rule is proposed for low-frequency sub-band coefficient fusion.The image gradient can be regarded as a clarity index in the space domain; the pixels with greater gradient values are considered useful information, such as edge or clearer area information, and selected as the fused pixels.Therefore, we used the gradient information of low-frequency coefficients to design the weight averaging factor ω.
First, the gradient values G I (x, y), G P (x, y) of the low-frequency sub-band images L I (x, y), L P (x, y) are obtained.Then, for the pixel (x, y), the gradient rate ρ = G I (x, y)/G P (x, y) is calculated, which represents the feature of the sub-band image content.The greater value of ρ indicates that the pixel (x, y) area of the I component contains more detailed information than the PAN image.The weight averaging factors ω(x, y) are obtained by a sigmoid function; here, a new discrete sigmoid function was constructed.
where K > 1 and is an odd number, it represents the shrink factor of the sigmoid function.
Remote Sens. 2017, 9, 391 9 of 21 After obtaining the weight averaging factor, the adaptive low-frequency coefficient fusion rule can be written as: From Formula (7), it can be seen that when K → +∞ , the proposed fusion rule is equivalent to the choose-max scheme.For the same K, when G I (x, y) • G P (x, y) → 0 , the weight coefficient ω(x, y) is closer to zero, and the fused coefficient mainly selects from the source PAN image; otherwise, when G I (x, y) • G P (x, y) >> 1, the weight coefficient ω(x, y) is closer to 1, and the fused coefficient mainly chooses from the I component; when G I (x, y) • G P (x, y) → 1 , the weight coefficient ω(x, y) is closer to one-half, and the proposed fusion rule can be seen as the weighting average fusion scheme.Therefore, the proposed fusion rule is adaptive because it can dynamically select the weighted value according to the coefficient features of the low-frequency sub-band image.

High-Frequency Coefficients Fusion Algorithm
After the decomposition of NSST, each source image can obtain a series of high-frequency sub-band images.The different scale high-frequency coefficients of the NSST not only provide the multi-scale information but also include plentiful edge and texture detail information of the source image.At the same scale, the coefficient absolute values are greater when the edge and texture features are more obvious.Thus, the absolute maximum is usually used as the high-frequency coefficient selection rule, but this rule ignores the correlation between the neighborhood pixels and may bring in noise to the fused image.Local SF can reflect the pixel neighborhood active index; the larger the SF value, the more active the pixel points in the local area are.Therefore, we propose to employ the local SF to fuse the high-frequency coefficients.The formula of SF is as follows: where RF and CF are row frequency and column frequency, respectively, which are defined as follows: where (2M + 1)(2N + 1) is the local window size.
Then the high frequency coefficients selection rule based on the SF can be written as:

Experiments and Discussion
To estimate the performance of the proposed pan-sharpening framework, some experiments have been performed on three different satellite datasets, which will be introduced in detail next.Several common evaluation indexes in the pan-sharpening field are explained.In addition, some comparison tests are implemented to compare the proposed framework with IHS-and PCA-based methods and other existing state-of-the-art pan-sharpening methods.

Datasets
In this paper, the proposed method has been conceived for very high-resolution datasets, and it has been tested on data acquired by some of the most advanced systems for remote sensing of optical data.These include WorldView-2, QuickBird, and IKONOS; their main characteristics are summarized in Tables 1 and 2. This sensors selection allows us to study the robustness of the proposed method with respect to both spectral and spatial resolution [35].The WorldView-2 dataset used in this paper was obtained from the literature [36], which is an open data share platform; it was collected by the Beijing key laboratory of digital media.The size of the MS images and PAN images in this dataset are 128 × 128 and 512 × 512, respectively.Simultaneously, the reference images were also provided in this dataset, so we directly used the reference images as a standard for objective evaluation.However, WorldView-2 is an 8-band MS satellite; this database only provided the 5 (R), 3 (G), and 2 (B) bands of the MS image as the real color image.In order to check the performance of the proposed framework in the case of a multiband (more than three bands), two other types of datasets, IKONOS [37] and QuickBird [38], were utilized.These two datasets are all captured from the four-band MS satellite.The size of the MS image and PAN image of IKONOS used in this paper are 256 × 256 and 1024 × 1024, respectively; those for QuickBird images are 512 × 512 and 2048 × 2048, respectively.Because there is no high resolution MS image in these two datasets, we degrade the original MS and PAN images by a factor of 4 using the protocol in literature [39] to yield the MS and PAN images of IKONOS with pixel size 64 × 64 and 256 × 256, and yield the MS and PAN images of QuickBird with pixel size 128 × 128 and 512 × 512.Then, the corresponding original MS images were used as the reference image to compare the fused image.

Quality Assessment of Fusion Results
Generally, we can measure the performance of an image fusion method through subjective and objective evaluations.The clarity of the objects and the proximity of the colors between the fused images and the original MS images are usually taken into account for subjective evaluation.However, it is difficult to compare the fusion quality accurately based solely on subjective evaluation.
To quantitatively evaluate the image fusion methods, several indexes are adopted to assess the performance of different fusion methods.In the experiments, seven well-known indexes are used and introduced in detail as follows: (1) Correlation Coefficient (CC): The CC of the fused image and reference image reflects the similarity of spectral features.The two images are correlated when the CC is close to 1. (2) Universal Image Quality Index (UIQI) [40]: The UIQI is used to measure the inter-band spectral quality of the fused image, the optimum value of which is 1.A value closer to 1 indicates that the quality of the fused image is better.(3) Mean Square Error (RMSE) [41]: The RMSE is widely used to assess the difference between the fused image F and the reference image R by calculating the changes in pixel values.The smaller the RMSE, the closer the fused image to the reference image.
(4) Relative Average Spectral Error (RASE) [42]: The RASE reflects the average performance of the fusion method in the spectral error aspect.The ideal value of RASE is 0. (5) Spectral Angle Mapper (SAM) [43]: The SAM reflects the spectral distortion between the fused image F and the reference image R. The smaller value of SAM denotes less spectral distortion in the fused image.(6) Erreur Relative Global Adimensionnelle de Synthèse (ERGAS) [44]: The ERGAS reflects the overall quality of the fused image.It represents the difference between the fused image F and the reference image R. A small ERGAS value means small spectral distortion.( 7) Quality with no reference (QNR) [45]: The QNR, which is composed of the spectral distortion index D λ and the spatial distortion index D s , reflects the overall quality of the fused image.The best value of QNR is 1.It is typically utilized to ensure that the quantitative evaluation is without a reference image.In this paper, we used the reference image instead of the original MS image for spectral assessment.

Performance Compariosn with IHS-and PCA-Based Pan-Sharpening Methods
IHS-and PCA-based hybrid methods are classical methods in pan-sharpening, and a great number of studies are based on these two theories.To compare the performance of the IHS-and PCA-based joint frameworks with the proposed matting model framework, we experimented on a WorldView-2 dataset by immobilizing the multiscale transform and used only the simple three-level WT as the multiscale transform tool; the simple average and max approaches are used as the low-and high-frequency coefficient fusion rules, respectively.The fused results and their corresponding objective evaluation are shown in Figure 6 and Table 3, respectively.The CC, UIQI, RASE, RMSE, SAM, ERGAS, and QNR are utilized to assess the quality of the fused images.The numbers in the brackets refer to the ideal index values.
Remote Sens. 2017, 9, 391 11 of 21 (4) Relative Average Spectral Error (RASE) [42]: The RASE reflects the average performance of the fusion method in the spectral error aspect.The ideal value of RASE is 0. (5) Spectral Angle Mapper (SAM) [43]: The SAM reflects the spectral distortion between the fused image F and the reference image R .The smaller value of SAM denotes less spectral distortion in the fused image.( 6) Erreur Relative Global Adimensionnelle de Synthèse (ERGAS) [43]: The ERGAS reflects the overall quality of the fused image.It represents the difference between the fused image F and the reference image R .A small ERGAS value means small spectral distortion.( 7) Quality with no reference (QNR) [45]: The QNR, which is composed of the spectral distortion index D λ and the spatial distortion index s D , reflects the overall quality of the fused image.
The best value of QNR is 1.It is typically utilized to ensure that the quantitative evaluation is without a reference image.In this paper, we used the reference image instead of the original MS image for spectral assessment.

Performance Compariosn with IHS-and PCA-Based Pan-Sharpening Methods
IHS-and PCA-based hybrid methods are classical methods in pan-sharpening, and a great number of studies are based on these two theories.To compare the performance of the IHS-and PCA-based joint frameworks with the proposed matting model framework, we experimented on a WorldView-2 dataset by immobilizing the multiscale transform and used only the simple three-level WT as the multiscale transform tool; the simple average and max approaches are used as the lowand high-frequency coefficient fusion rules, respectively.The fused results and their corresponding objective evaluation are shown in Figure 6 and Table 3, respectively.The CC, UIQI, RASE, RMSE, SAM, ERGAS, and QNR are utilized to assess the quality of the fused images.The numbers in the brackets refer to the ideal index values.Table 3. Objective evaluation of the experimental results shown in Figure 6.    Figure 6a is the MS image sized 128 × 128; Figure 6b is the PAN image sized 512 × 512; Figure 6c is the reference image.Figure 6d-f are the IHS, PCA, and proposed framework fused results.Due to the excellent effects, it is difficult to compare the performance only from visual contrast except that the IHS-fused image exhibits a bit of spectral distortion relative to the reference image.However, from Table 3, the superiority of the proposed framework compared to the IHS-and PCA-based methods can be easily observed in terms of both the spatial and spectral objective quality evaluation indexes.The implementation efficiency is also high, despite the proposed pan-sharpening framework being inferior to the IHS and PCA methods due to the algorithm in the matting model, and this problem will be improved with the development of graphics processing unit (GPU) technology.Thus, the proposed framework is feasible and effective in the pan-sharpening field.

Comparison with State-of-the-Art Methods
Several experiments have been carried out to compare the pan-sharpening performance of these methods, which include the proposed framework and some existing state-of-the-art approaches.Moreover, the source images acquired by different satellite datasets and different types of land cover were utilized to test the robust performance of the proposed framework.In this part, the proposed framework is respectively combined with DWT (set as experimental, as noted previously) and NSST to compete with the following excellent pan-sharpening approaches, which have for the most part appeared in recent years.
The first group experiment was performed on the WorldViwe-2 datasets coast area.Figure 7 shows the source images and their fusion results; Figure 7a,b are the MS image and PAN image; Figure 7c is the ground truth image; Figure 7d-l are the fusion results of different pan-sharpening methods.To observe the fusion results more clearly, we cut a sub-image placing at the top left corner in the fused images.A visual comparison indicates that the DWT-SR and Curvelet results exhibit spectral distortion phenomena; the result of the NSST-SR method shows improvement, but not remarkably.The GF, AWLP, and BFLP methods obtained better spectral information overall.However, from the sub-images, it is apparent that some serious artifacts occurred around the cars.The fusion results of the MM, MM-DWT, and MM-NSST methods are all close to the reference image, particularly the spectral aspect, which indicates the superiority of the matting model in spectral preservation.Undeniably, the MM-DWT result has spatial distortion in the road area.Due to the productive shift invariance in NSST, the proposed MM-NSST method can overcome the artifacts effectively; thus, apart from the outstanding spectral information, this method can obtain high spatial quality as well.Table 4 presents the objective evaluation results in Figure 7; although the largest SAM value is achieved by the MM method, the proposed MM-NSST method obtains the best values in other indexes as well as the second largest SAM value.In summary, the proposed pan-sharpening framework can obtain outstanding performance in terms of both spatial acquirement and spectral preserving.Different land cover types were considered with respect to performance, and the second group experiment was tested on the WorldView-2 datasets with urban land cover.Figure 8a-c are the source MS and PAN images and reference image, respectively.Figure 8d-l and Table 5 display the fusion result images and the objective evaluation results of different methods.Spatial and spectral information analyses were performed.Due to this area object's characteristics, apart from the fusion result of the DWT-SR method suffering from slight spectral distortions and the AWLP, BFLP methods have some degree of spatial blurring, and the fusion results of the other methods obtained fused images with good visual performance, which approximate the reference image.Therefore, there was no obvious comparison with subjective evaluation.However, the objective quality comparison results shown in Table 5 indicate that the proposed method obtained the best values for all evaluation indexes; thus, the proposed method achieves the highest spatial quality while  Different land cover types were considered with respect to performance, and the second group experiment was tested on the WorldView-2 datasets with urban land cover.Figure 8a-c are the source MS and PAN images and reference image, respectively.Figure 8d-l and Table 5 display the fusion result images and the objective evaluation results of different methods.Spatial and spectral information analyses were performed.Due to this area object's characteristics, apart from the fusion result of the DWT-SR method suffering from slight spectral distortions and the AWLP, BFLP methods have some degree of spatial blurring, and the fusion results of the other methods obtained fused images with good visual performance, which approximate the reference image.Therefore, there was no obvious comparison with subjective evaluation.However, the objective quality comparison results shown in Table 5 indicate that the proposed method obtained the best values for all evaluation indexes; thus, the proposed method achieves the highest spatial quality while maintaining the best spectral information.The third group experiment was performed on an uptown area from the WorldView-2 dataset.Usually, suburban areas contain abundant vegetation cover, which is commonly used in spectral contrast.Figure 9a is the low-resolution MS image, and Figure 9c is the high-resolution MS reference image.Using these two MS images as spectral references, we can visually compare the spectral maintaining performance of different pan-sharpening approaches.The spectral distortion of the DWT-SR method is the most serious; the vegetation area in particular is affected, although it obtains preferable spatial quality.The Curvelet and NSST-SR methods show some improvement.The GF method exhibits varying degrees of spectral aberrance and introduces some spectral information that does not exist in the source MS image.The AWLP, BFLP, MM, and proposed methods achieve  The third group experiment was performed on an uptown area from the WorldView-2 dataset.Usually, suburban areas contain abundant vegetation cover, which is commonly used in spectral contrast.Figure 9a is the low-resolution MS image, and Figure 9c is the high-resolution MS reference image.Using these two MS images as spectral references, we can visually compare the spectral maintaining performance of different pan-sharpening approaches.The spectral distortion of the DWT-SR method is the most serious; the vegetation area in particular is affected, although it obtains preferable spatial quality.The Curvelet and NSST-SR methods show some improvement.The GF method exhibits varying degrees of spectral aberrance and introduces some spectral information that does not exist in the source MS image.The AWLP, BFLP, MM, and proposed methods achieve better progress in spectral maintaining.Table 6 shows the objective quality assessments of this group experiment, from which we can see that the proposed MM-NSST method obtains the best effect compared with other methods.The IKONOS dataset was used in the fourth group experiment to compare the performance of the proposed framework on different satellite images.The MS image of this dataset includes four bands (nir red, red, green, and blue).We used the nir red, green and blue bands for display.Figure 10a,b are the degraded MS image and PAN image; Figure 10c is the original MS image and is used as a reference image.Similarly, we cut the sub-images and placed them at the top left corner in the fused images as well.Compared with the reference image, the fusion performance of the  The IKONOS dataset was used in the fourth group experiment to compare the performance of the proposed framework on different satellite images.The MS image of this dataset includes four bands (nir red, red, green, and blue).We used the nir red, green and blue bands for display.Figure 10a,b are the degraded MS image and PAN image; Figure 10c is the original MS image and is used as a reference image.Similarly, we cut the sub-images and placed them at the top left corner in the fused images as well.Compared with the reference image, the fusion performance of the DWT-SR method is unsatisfactory both in the spatial and spectral aspects.The Curvelet and NSST-SR methods improved with respect to the spatial aspect, but the spectral side needs further improvement.The GF method result still suffers from spectral distortion.Other resultant images look similar to the original MS image.The objective evaluation results shown in Table 7 indicate that the proposed MM-NSST method excels in six indexes and the QNR value is the second largest; this is testament to the superiority of our framework on the IKONOS dataset.The last group dataset used in this paper is obtained from the QuickBird satellite; it also consists of four bands in the MS image.We displayed the three band color images as in the IKONOS set.These experimental data focus on the suburban regions, which have a great deal of vegetation.Thus, it is more convenient for spectral information comparison.From Figure 11, it can be seen that all the pan-sharpening methods achieved decent spatial quality.The DWT-SR and Curvelet methods have slight spectral distortion compared to the reference image, and the spectral aberrance of the GF method result is much better in this dataset.Other methods maintain the spectral information of the  The last group dataset used in this paper is obtained from the QuickBird satellite; it also consists of four bands in the MS image.We displayed the three band color images as in the IKONOS set.These experimental data focus on the suburban regions, which have a great deal of vegetation.Thus, it is more convenient for spectral information comparison.From Figure 11, it can be seen that all the pan-sharpening methods achieved decent spatial quality.The DWT-SR and Curvelet methods have slight spectral distortion compared to the reference image, and the spectral aberrance of the GF method result is much better in this dataset.Other methods maintain the spectral information of the MS image effectively.Table 8 shows that the proposed MM-DWT method obtains the best performance in the QNR index and the proposed MM-NSST can obtain the best values in other indexes.Thus, the superiority of the proposed pan-sharpening framework is demonstrated again.In this part, the operating efficiency of the proposed method is analyzed.Because the proposed pan-sharpening framework is based on the multiscale transform, comparison with the non-multiscale transform method is meaningless.Thus, only the multiscale transform-based

Contrast on Running Time of Multiscale Transform-Based Methods
In this part, the operating efficiency of the proposed method is analyzed.Because the proposed pan-sharpening framework is based on the multiscale transform, comparison with the non-multiscale transform method is meaningless.Thus, only the multiscale transform-based methods are compared.Table 9 shows a comparison of the average running time of the preceding five groups of the multiscale transform-based methods in part 4.4.It can be seen that the running time of the multiscale transform-based pan-sharpening approaches is mainly decided by the transform method and the fusion rules.Obviously, the SR-based fusion rule is time consuming.Due to the more complex decomposition and reconstruction procedures, the NSST requires more time in multiscale transform image fusion than the DWT and Curvelet approaches, but NSST has been proven effective in the foregoing part.In addition, the proposed framework combined with DWT method is the fastest compared to other multiscale transform-based methods.Meanwhile, it should be noticed that in the previous five groups experiments, this method also achieved the desired performance (particularly for those groups on the IKONOS and QuickBird datasets), although the simplest fusion rules were used.Thus, the proposed framework can simultaneously meet the conditions of satisfaction in efficiency and effectiveness if the appropriate multiscale transform method and fusion rules are utilized.

Implementation Details
In this section, we provide some implementation details for the proposed method and other comparative methods.All codes are implemented in MATLAB 8.3 (running on a Windows 10 system PC with Intel Core i5-350 3.30-GHz processor and 8-GB memory).All of the resample operations in this paper are bicubic interpolation algorithms.In our method, three-level NSST decomposition with a (30,40,60) shearing filter matrix and (2, 3, 4) direction parameters are applied to the upsampled I component and matched PAN image, respectively.The pyramid filter is 'pyrexc' wavelet.The parameter K in NSST low-frequency coefficients fusion is 99, and the window size of the local SF in high-frequency coefficient fusion is 3 × 3. The other pan-sharpening techniques for performance comparison are well known; the experiment parameters in these methods are set as the references.

Conclusions
Spectral image pan-sharpening is an important process in remote sensing research and applications.Addressing existing problems in the pan-sharpening area, this paper presented a new pan-sharpening framework by exploiting the matting model and multiscale transform.The proposed framework first obtains the spectral foreground and spectral background of the low-resolution MS image with a matting model, in which the I component of the MS image was used as the alpha channel.Then, the PAN image and the upsampled I component were fused by using a proposed multi-scale image fusion method.The fused image can be used as a new alpha channel to obtain a high-resolution MS image based on a linear combination with upsampled spectral foreground and spectral background images.In this paper, the NSST was introduced in detail as a multi-scale transform example.Through our framework, the PAN and MS image fusion problem was solved.To improve the performance of our method, an adaptive weighted averaging in gradient field rule and a local SF rule were proposed to fuse the low-frequency and high-frequency sub-band coefficients, respectively.Experimental results on different satellite datasets and different land cover certify that the proposed method can achieve superior spatial detail from the PAN image while preserving more spectral information from the MS image compared to some existing pan-sharpening methods.Therefore, the results produced by the proposed method could be better used in the remote sensing applications such as image interpretation, water quality evaluation and vegetated areas research.Importantly, the proposed framework is suitable for any multiscale transform tool to cope with the pan-sharpening research; thus, this framework has bright prospects.
In future works, we would like to improve the effectiveness of the proposed framework by constructing some more powerful multiresolution analysis tools with different Wavelet basis functions.We would also like to extend the application ranges to diverse sensor data, such as optical and radar remote sensing images, thermal infrared and PAN images.Moreover, as in most pan-sharpening literature, we used completely registered satellite images for experiments.We plan to collect some datasets with temporal and instrumental change, and analyze the robustness of the proposed framework on them.
c is the cth color channel.The values F c mx , F c my , B c mx , B c my , α mx , and α my are the horizontal and vertical derivatives of the spectral foreground F c , spectral background B c , and alpha channel α.Remote Sens. 2017, 9, 391 4 of 21 the input image and alpha channel α are decided, the spatially smooth foreground image F and background image B can be estimated by solving the following energy function: my α are the horizontal and vertical derivatives of the spectral foreground c F , spectral background c B , and alpha channel α .

Figure 4 .
Figure 4.The schematic diagram of the proposed pan-sharpening framework.

Figure 4 .
Figure 4.The schematic diagram of the proposed pan-sharpening framework.

LH
) NSST decomposition: The upsampled I and the matched PAN image are decomposed to obtain the relevant different scale and the directional sub-band coefficients are the low-frequency sub-band coefficients, are the j th scale, l th directional high-frequency sub-band coefficients.(3)Low-frequency coefficients fusion: The low frequency fused coefficients 0 F j L are obtained by using a gradient domain-based adaptive weighted averaging rule.(4) High-frequency coefficients fusion: The local SF maxima rule is adopted to obtain the highfrequency fused coefficients ,

Figure 5 .
Figure 5.The flow chart of upsampled I and matched panchromatic (PAN) image fusion using NSST.
G x y of the low-frequency sub-band images { ( , ), ( , )} I P L x y L x y are obtained.Then, for the pixel ( , ) x y , the gradient rate ( , ) / ( , ) represents the feature of the sub-band image content.The greater value of ρ indicates that the pixel ( , ) x y area of the I component contains more detailed information than the PAN image.The weight averaging factors ( , ) x y ω are obtained by a sigmoid function; here, a new discrete sigmoid function was constructed.

Figure 5 .
Figure 5.The flow chart of upsampled I and matched panchromatic (PAN) image fusion using NSST.
S = .The values j , l , and k are scale, direction and shift parameters, respectively.

Table 1 .
Spatial resolutions of the satellite datasets used in this study.

Table 2 .
The wavelength range (nm) of spectral bands in different satellite datasets used in this paper.

Table 3 .
Objective evaluation of the experimental results shown in Figure6.

Table 4 .
Objective evaluation of the experimental results shown in Figure7.

Table 4 .
Objective evaluation of the experimental results shown in Figure7.

Table 5 .
Objective evaluation of the experimental results shown in Figure8.

Table 5 .
Objective evaluation of the experimental results shown in Figure8.

Table 6 .
Objective evaluation of the experimental results shown in Figure9.

Table 6 .
Objective evaluation of the experimental results shown in Figure9.

Table 7 .
Objective evaluation of the experimental results shown in Figure10.

Table 7 .
Objective evaluation of the experimental results shown in Figure10.

Table 8 .
Objective evaluation of the experimental results shown in Figure11.
4.5.Contrast on Running Time of Multiscale Transform-Based Methods

Table 8 .
Objective evaluation of the experimental results shown in Figure11.

Table 9 .
Running time comparison of the multiscale transform-based methods.