With the development of satellite technology and imaging systems, we are able to obtain remote sensing images with higher resolution. High-resolution remote sensing images can describe target information more accurately, which is of great significance in the development of numerous applications, such as environmental monitoring, land and resource planning, military mapping, object recognition, and scene interpretation. Most commercial satellites, such as QuickBird, IKONOS, GeoEye, and WorldView, can currently jointly obtain panchromatic (PAN) and multispectral (MS) images. Due to imaging physical constraints and transmission bandwidth limit, it has been difficult to obtain images with the characteristics of both high spatial resolution and high spectral resolution. Fusion is one of the most important and effective methods in providing better interpretation ability of remote sensing images. How to combine complementary information of PAN and MS images is an urgent problem to be solved.
MS images have the advantage of high spectral resolution, while PAN images have the advantage of high spatial resolution. The purpose of fusion is to combine the complementary characteristics of the original images and provide enough information for image interpretation. The component substitution (CS) method is a traditional pansharpening model, which includes intensity-hue-saturation transformation (IHS) [
1], principle component analysis (PCA) [
2], the Gram-Schmidt process (GS) [
3], and so on. These methods offer outstanding spatial quality but they have the problem of serious spectral distortion in the fused image. The CS method has been extended and improved based on new theories over time, because of low computational complexity, [
4,
5]. In [
6], an adaptive image fusion method that is based on the concept of partial replacement of intensity component is proposed, which is known as partial replacement adaptive CS (PRACS). A context-adaptive (CA) pansharpening method that is based on image segmentation is proposed in [
7], which is integrated into the GS scheme in order to achieve a better estimation of the injection coefficients. The band-dependent spatial-detail (BDSD) model is also known as adaptive CS [
8]. Model-based methods have gathered increasing interest in recent studies. This kind of method based on complex models can achieve a better pansharpening effect in some cases, but the time complexity is high due to the optimization process. Within this family, many contributions that are based on Bayesian methods rely on the sparse representations of signals [
9,
10] and total variation penalization [
11,
12] terms. Essentially, this can be regarded as a strategy of image repair, which consists of the reconstruction of the high-resolution MS image from the original data [
13]. Multi-resolution analysis (MRA) methods, which decompose the image into different frequency coefficients, can harmonize the injection of spatial details and maintenance of spectral information. The MRA scheme is based on the injection of detailed information that is extracted from the decomposition coefficients of PAN images into the low-resolution MS bands. Wavelet transform is one of the MRA methods and it is an important milestone in the field of image processing as a mathematical tool [
14]. The fusion results of wavelet transform provide certain improvements in preserving spectral information, but they also have shortcomings, such as direction limitation, shift, and aliasing. When compared with discrete wavelet transform (DWT), dual-tree complex wavelet transform (DTCWT) [
15] has the advantages of shift-invariance and directional selectivity, but its limited number of directional textures and edges of wavelet families make it difficult to represent two-dimensional (2-D) images. To solve this problem, a number of multi-scale geometric analysis tools, such as curvelet transform [
16], contourlet transform [
17], and shearlet transform [
18], have been developed and successfully applied to the pansharpening problem. The main motivation of multi-scale geometric analysis methods is to pursue a “true” 2-D transform [
19], which can effectively capture the geometric structure of an image, so that the fusion quality can be further improved. However, without the shift-invariant property, the contourlet and shearlet transform may suffer the frequency alias problem. The non-subsampled contourlet transform (NSCT) [
20,
21] is an effective solution to this problem, but its application is limited by the finite decomposition directions and high computational complexity. Non-subsampled shearlet transform (NSST) is a shift-invariant version of shearlet transform, which attains a low computational cost and good image sparse representation performance [
22]. For the current study, the appearance of NSST has provided a new solution for the pansharpening issue. As a novel multi-scale geometric analysis method, NSST is one of the topics that are currently being studied by many researchers. Moonon [
23] proposed a remote sensing image fusion method based on NSST and sparse representation. Wu proposed a method that is based on improved non-negative matrix decomposition in the NSST domain [
24] and a fusion method using chaotic bee colony optimization in the NSST domain [
25]. Yang [
26] proposed a pansharpening framework based on the matting model and multi-scale transform. These methods have good effects on pansharpening, although they all are subject to their own limitations.
The lack of an anti-aliasing feature in multi-scale decomposition tends to cause decision bias in the boundary region of objects. The bias will result in an artificial texture and image non-uniformity, and it therefore causes a bad influence on visual effects and image interpretation. In order to solve this problem, some spatial techniques and optimization strategies have been introduced into the fusion method, such as bilateral filter [
27], cross bilateral filter (CBF) [
28], weighted least squares filter [
29], and guided image filter (GIF) [
30,
31]. The GIF is one of the fastest edge-preserving local filters and it is superior to bilateral filters in avoiding gradient reversal. Meng [
32] proposed a pansharpening method with an edge-preserving guided filter based on three-layer decomposition and the decomposed PAN image is injected into the MS image within this method. However, due to the fixed regularized values in the GIF, the edges will inevitably be smoothed. Li [
33] proposed a weighted GIF, which avoided the edge blurring problem to some extent. However, these two methods do not have explicit constraints in processing the edges of images and the filtering process is usually accompanied by image coarsening. When both edge preservation and filtering are considered, the problem of edge blurring will inevitably occur. Moreover, in some cases, these methods still cannot maintain the edges well, which leads to the degradation of fusion quality. Kou [
34] proposed a gradient domain GIF, in which the introduction of explicit first-order edge condition constraints defines a new edge-perception weight, so that the edges of an image can be better preserved.
Based on a gradient domain GIF with excellent edge-preserving properties, a new fusion method of MS and PAN images in the NSST domain is proposed. The MS and PAN images are decomposed by NSST to obtain the coefficients of a different frequency. For the high-frequency coefficients in the NSST domain, an improved pulse coupled neural network (PCNN) model is used to obtain the initial firing map. Unlike previous methods that directly calculate the fusion decision map, the gradient domain GIF is used to optimize the firing map, and then the fusion decision map is calculated in order to guide the fusion of high-frequency coefficients. For the low-frequency coefficients in the NSST domain, a fusion strategy that is based on morphological filter-based intensity modulation (MFIM) technology is adopted. The gradient domain GIF is used to perform the edge refinement on the modulated low-frequency coefficients to obtain the fusion result of the low-frequency coefficients. The experimental results show that, in the proposed method, the detailed information and spatial continuity can be effectively improved while still maintaining excellent spectral information.