Saliency-Guided Nonsubsampled Shearlet Transform for Multisource Remote Sensing Image Fusion

The rapid development of remote sensing and space technology provides multisource remote sensing image data for earth observation in the same area. Information provided by these images, however, is often complementary and cooperative, and multisource image fusion is still challenging. This paper proposes a novel multisource remote sensing image fusion algorithm. It integrates the contrast saliency map (CSM) and the sum-modified-Laplacian (SML) in the nonsubsampled shearlet transform (NSST) domain. The NSST is utilized to decompose the source images into low-frequency sub-bands and high-frequency sub-bands. Low-frequency sub-bands reflect the contrast and brightness of the source images, while high-frequency sub-bands reflect the texture and details of the source images. Using this information, the contrast saliency map and SML fusion rules are introduced into the corresponding sub-bands. Finally, the inverse NSST reconstructs the fusion image. Experimental results demonstrate that the proposed multisource remote image fusion technique performs well in terms of contrast enhancement and detail preservation.


Introduction
Remote sensing images play an important role in urban planning, environmental monitoring, and military defense [1]. As a basic step of target classification, detection, and recognition in remote sensing images, remote sensing image fusion has attracted more and more research interest across the world. Due to the incident wavelengths of the remote sensing images in the same region being different, multiband remote sensing images have significant differences. The high-band remote sensing image can provide an overall view of the scene, which is similar to optical imaging, while the low-band remote sensing image is relatively bleak and has deeper penetration. Remote sensing image fusion can integrate multiband remote sensing images into a comprehensive image, which is conducive to the recognition and observation of ground objects [1][2][3].
Multisource remote sensing image fusion is an information processing technology for the fusion of multisensor, multiplatform remote sensing and multispectral band remote sensing data. The fusion image contains different spatial, temporal, and spectral information of multisensor, which allows for preparation for further analysis and processing. Many image fusion methods have been proposed in recent decades; however, image fusion algorithms based on transform domain and edge-preserving filters are widely used [4]. In terms of transform domain-based image fusion frameworks, the wavelet transform, discrete wavelet transform (DWT) [5], dual-tree complex wavelet transform (DTCWT) [5], dual-tree complex wavelet package transform (DTCWPT) [6], framelet transform [7], curvelet transform [5], contourlet transform [8], nonsubsampled contourlet transform (NSCT) [9], shearlet transform [10], and nonsubsampled shearlet transform (NSST) [11], etc., are adapted to the field of image fusion. Iqbal et al. [12] introduced a multifocus image fusion approach using a DWT and a guided image filter to improve the definition of the fused images. Aishwarya et al. [13] used a DTCWT and an adaptive combined clustered dictionary for visible and infrared image fusion to enhance the target information. Wang et al. [14] proposed a multispectral (MS) and panchromatic (PAN) image fusion technique based on the hidden Markov tree model in a complex tight framelet transform domain to improve the spatial resolution of the MS image while keeping the spectral information. Due to the fact that the wavelet transform cannot capture the abundant directional information of remote sensing images and can introduce spatial distortion, a contourlet transform and an NSCT are introduced to resolve this shortcoming. Yang et al. [15] proposed a remote sensing image fusion algorithm via a contourlet hidden Markov tree and a clarity-saliency-driven pulse couple neural network (PCNN) model to enhance the edges and contours of fused remote sensing images. Li et al. [16] introduced an image fusion method using dynamic threshold neural P systems and NSCT for multimodality medical imaging to improve the visual quality and fusion performance. Because the contourlet transform-and NSCT-based image fusion approaches are computationally complex, the shearlet transform and the NSST are proposed to increase computational efficiency. Because the shearlet transform lacks translation invariance, the NSST has become more widely used as the improved version of the shearlet transform in the field of image processing. Yin et al. [17] proposed an image fusion technique via NSST and parameter-adaptive pulse coupled neural network (PAPCNN) to improve the contrast and brightness of the fused medical images. Wang et al. [18] introduced the nonsubsampled shearlet transform hidden Markov forest (NSST-HMF) model for pansharpening to improve the spatial resolution of hyperspectral images while preserving spectral features.
In terms of edge preserving filter-based image fusion approaches, the guided image filter, cross bilateral filter, and rolling guidance filter, etc., are widely used. Li et al. [19] first introduced the guided image filter for image fusion, for which the computational complexity is relatively low. Then, the combination of guided image filtering and other transform domain algorithms such as DTCWT, NSCT, and NSST is introduced into the field of image fusion, and good results are achieved. Shreyamsha et al. [20] introduced the cross bilateral filter for image fusion based on pixel significance to enhance the visual quality of the fused images. Jian et al. [21] proposed a multiscale image fusion method using a rolling guidance filter to preserve the details and suppress the artifacts of the fused images.
In this work, a novel remote sensing image fusion algorithm using a contrast saliency map (CSM) and SML in the NSST domain is proposed. The contrast saliency map-based fusion rule and SML-based fusion rule are used to merge the low-and high-frequency sub-bands, respectively. Experimental results demonstrate the effectiveness of the proposed remote sensing image fusion method over the traditional and state-of-the-art fusion algorithms in terms of qualitative and quantitative analysis.
The rest of this work is organized as follows: Section 2 shows the related works, Section 3 depicts the proposed remote sensing image fusion method, the experiments and results are summarized in Section 4, and the conclusions are provided in Section 5.

Related Works Nonsubsampled Shearlet Transform
Nonsubsampled shearlet transform (NSST) is a kind of nonsubsampled multiscale transform, which was introduced based on the theory of shearlet transform [11,18]. The image is decomposed by NSST into multiple scales with multiple directions by multiscale and multidirectional decompositions. Firstly, the nonsubsampled pyramid (NSP) is adopted as the multiscale decomposition filter to decompose the image into one low-frequency sub-band and one high-frequency sub-band. Then, the high-frequency sub-band is decomposed by the shearing filter (SF) to achieve the multidirectional sub-bands. Due to the NSST decomposition process having no subsampling for the NSP and the SF, the NSST is shift-invariant. Figure 1 denotes the example of three levels of NSST decomposition of a zoneplate image, where all the images are displayed in the "jet" colormap and the direction numbers from coarser to finer are 4, 8, and 8. Figure 1a  band and one high-frequency sub-band. Then, the high-frequency sub-band is decomposed by the shearing filter (SF) to achieve the multidirectional sub-bands. Due to the NSST decomposition process having no subsampling for the NSP and the SF, the NSST is shift-invariant. Figure 1 denotes the example of three levels of NSST decomposition of a zoneplate image, where all the images are displayed in the "jet" colormap and the direction numbers from coarser to finer are 4, 8, and 8. Figure 1a depicts the original zoneplate image, Figure  1b shows the low-frequency component, and Figure 1c-e show the high-frequency subband images with the direction numbers 4, 8, and 8, respectively.

Proposed Fusion Method
In this section, a novel remote sensing image fusion method based on the NSST is proposed, and the whole process can be divided into four parts: NSST decomposition, low-frequency sub-band fusion, high-frequency sub-band fusion, and inverse NSST image reconstruction. Suppose the input remote sensing images are A and B, then the two images are decomposed up to N levels utilizing the NSST to generate the decomposed sub-bands  

Proposed Fusion Method
In this section, a novel remote sensing image fusion method based on the NSST is proposed, and the whole process can be divided into four parts: NSST decomposition, low-frequency sub-band fusion, high-frequency sub-band fusion, and inverse NSST image reconstruction. Suppose the input remote sensing images are A and B, then the two images are decomposed up to N levels utilizing the NSST to generate the decomposed sub-bands L A , H l,d

Fusion of Low-Frequency Components
The low-frequency sub-bands present the brightness and contrast information of the source remote sensing images [22]. In this section, in order to preserve the contrast, the contrast saliency maps (CSM) of the low-frequency components are constructed based on the brightness distribution. The contrast of the image denotes the difference between the lowest and highest brightness levels in the remote sensing images, and where the difference in brightness is more significant, a higher contrast is implied. Therefore, we can infer that the brighter or darker the pixel value is relative to the average value of the image, the greater its contribution to the image contrast and the stronger the contrast significance. The L2 norm is used to judge the deviation between pixel value and average value, and the significance of each pixel is expressed. When the L2 norm is performed on the lowfrequency sub-bands LA and LB, the contrast saliency maps A L S and B L S of the low-frequency sub-bands are generated by the following:

Fusion of Low-Frequency Components
The low-frequency sub-bands present the brightness and contrast information of the source remote sensing images [22]. In this section, in order to preserve the contrast, the contrast saliency maps (CSM) of the low-frequency components are constructed based on the brightness distribution. The contrast of the image denotes the difference between the lowest and highest brightness levels in the remote sensing images, and where the difference in brightness is more significant, a higher contrast is implied. Therefore, we can infer that the brighter or darker the pixel value is relative to the average value of the image, the greater its contribution to the image contrast and the stronger the contrast significance. The L 2 norm is used to judge the deviation between pixel value and average value, and the significance of each pixel is expressed. When the L 2 norm is performed on the low-frequency sub-bands L A and L B , the contrast saliency maps S L A and S L B of the low-frequency sub-bands are generated by the following: where the mean(·) denotes the average value of the image. L 2 norm is used to eliminate the effect of symbols, and the norm(·) function is defined as follows: The weight matrices W L A and W L B of the low-frequency components are calculated by the following formulas performed on the saliency maps of the low-frequency sub-bands: The fused low-frequency sub-bands are computed by the Hadamard product performed on the low-frequency components and the corresponding weight matrices, and the corresponding equation is defined as follows: where L F represents the fused low-frequency component, and * shows the Hadamard product.

Fusion of High-Frequency Components
The high-frequency components contain the texture information and details. In this section, the sum-modified-Laplacian (SML) is used to process the high-frequency subbands. The SML is defined for the local window with the size (2P + 1)(2Q + 1), and the corresponding formula is calculated by [23]: where step denotes the changeable interval among the high-frequency coefficients. It is usually defined as 1.
The fused high-frequency sub-bands can be computed by: where H F denotes the fused high-frequency components. The whole procedure of the proposed remote sensing image fusion method can be summarized in Algorithm 1. (2) The fused low-frequency band L F is obtained by Equation (6).
(2) The fused high-frequency band H F is computed by Equation (9).
Step 4: inverse NSST and image reconstruction The fused image F is reconstructed by inverse NSST performed on the fused low-and high-frequency bands L F , H l,d F .

Experimental Results and Discussion
In this section, in order to demonstrate the effectiveness of the proposed multisource remote sensing image fusion method via NSST, public data sets (https://sites.google.com/ view/durgaprasadbavirisetti/datasets (accessed on 15 December 2020)) are used for simulation, and several state-of-the-art image fusion algorithms are adapted for comparison, namely image fusion based on a guided image filter (GFF) [19], image matting for the fusion of multifocus image (IFM) [24], image fusion using a dual-tree complex wavelet transform (DTCWT) [5], curvelet transform-based image fusion (CVT) [5], image fusion utilizing phase congruency (PC) [25], structure-aware image fusion (SAIF) [26], fusing infrared and visible images of different resolutions via total variation model (DRTV) [27], multimodal image seamless fusion (MISF) [28], and parameter-adaptive pulse-coupled neural network-based image fusion via a nonsubsampled shearlet transform (NSST) [17]. In order to reflect the fairness of the algorithm, the parameters of the comparison algorithms are consistent with the original published papers. In the proposed fusion technique, the number of NSST decomposition levels is four, and the direction numbers from coarser to finer are 8, 8, 16, and 16. The selected remote sensing image data sets are shown in Figure 3.
In order to objectively assess the fusion performances of all the different fusion techniques, a lot of image fusion evaluation indexes have been introduced in these years. It is known to us that just one evaluation index could not well demonstrate the quality of fused images in quantitative assessment. Thus, for the sake of making a comprehensive evaluation for the fusion images, six popular fusion evaluation metrics are introduced in this section, namely visual information fidelity for fusion (VIFF) [29][30][31][32][33], Q S [34], average gradient (AG) [20,35,36], correlation coefficient (CC) [20,37,38], spatial frequency (SF) [20,[39][40][41], and Q W [34,42]. In terms of all the six metrics, the higher the value data of the evaluation index, the better the fusion performance will be. The experimental results are depicted in Figures 4-7 and Tables 1-5.

Qualitative Analysis
In this section, the fusion results obtained by the proposed method and the compared results calculated by nine other fusion algorithms are given in Figures 4-7. The Figures 4-7a,b show the source images A and B, respectively. As seen from Figure 4, the GFF, DTCWT, CVT and DRTV algorithms decrease the contrast of the fusion images, making some details invisible (see Figure 4c,e,f,i). The IFM, SAIF, and MISF methods appear to generate a block effect and artifacts, affecting the observation of the fused images (see Figure 4d,h,j). The PC algorithm makes the image distorted (see Figure 4g). The NSST technique provides overly high brightness (see Figure 4k). The proposed fusion technique can provide a high-definition image and preserve spatial detail information in the fused image (see Figure 4l).
From Figure 5, we can see that the GFF, IFM, and DRTV methods make the fused image darker in some regions (see Figure 5c,d,i). The DTCWT and CVT methods make the fused images better compared to the previous methods (see Figure 5e,f). The PC approach provides a poor fusion performance (see Figure 5g). The SAIF and MISF algorithms introduce artifacts (see Figure 5h,j). The NSST method makes the fused image brighter, and it is not conducive to the acquisition of target information from the fused image (see Figure 5k). The proposed fusion method provides a better fusion effect (see Figure 5l).
From Figure 6, it can be seen that the GFF, IFM, DTCWT, and CVT algorithms decrease the contrast and make the images darker (see Figure 6c-f). The PC technique appears to generate a block effect (see Figure 6g). The SAIF, MISF, and NSST methods produce artifacts, and the brightness is over-enhanced in some regions (see Figure 6h,j,k). The DRTV method produces over-enhanced brightness in some regions and an overly smooth fusion image (see Figure 6i). The proposed algorithm can enhance the contrast and definition, which is helpful in obtaining the target information from the fused image (see Figure 6l).       From Figure 7, we can see that the GFF, IFM, SAIF, and MISF algorithms make the fusion image darker (see Figure 7c,d,h,j). The DTCWT and CVT methods produce a good fusion visual effect (see Figure 7e,f). The PC, DRTV, and NSST techniques produce distortion and artifacts (see Figure 7g,i,k). The proposed fusion technique can produce relatively higher contrast and preserve the texture information (see Figure 7l).
In summary, the analysis of the subjective assessment of the fusion results demonstrates the super-performance of the proposed remote sensing image fusion technique when compared with the state-of-the-art fusion algorithms.   From Figure 7, we can see that the GFF, IFM, SAIF, and MISF algorithms make the fusion image darker (see Figure 7c,d,h,j). The DTCWT and CVT methods produce a good fusion visual effect (see Figure 7e,f). The PC, DRTV, and NSST techniques produce distortion and artifacts (see Figure 7g,i,k). The proposed fusion technique can produce relatively higher contrast and preserve the texture information (see Figure 7l).
In summary, the analysis of the subjective assessment of the fusion results demonstrates the super-performance of the proposed remote sensing image fusion technique when compared with the state-of-the-art fusion algorithms.    Table 3. The objective evaluation of the methods in Figure 6.

Qualitative Analysis
In this section, the fusion results obtained by the proposed method and the compared results calculated by nine other fusion algorithms are given in Figures 4-7. The Figures 4, 5, 6 and 7a,b show the source images A and B, respectively. As seen from Figure 4, the GFF, DTCWT, CVT and DRTV algorithms decrease the contrast of the fusion images, making some details invisible (see Figure 4c,e,f,i). The IFM, SAIF, and MISF methods appear to generate a block effect and artifacts, affecting the observation of the fused images (see Figure 4d,h,j). The PC algorithm makes the image distorted (see Figure 4g). The NSST technique provides overly high brightness (see Figure 4k). The proposed fusion technique can provide a high-definition image and preserve spatial detail information in the fused image (see Figure 4l).
From Figure 5, we can see that the GFF, IFM, and DRTV methods make the fused image darker in some regions (see Figure 5c,d,i). The DTCWT and CVT methods make the fused images better compared to the previous methods (see Figure 5e,f). The PC approach provides a poor fusion performance (see Figure 5g). The SAIF and MISF algorithms introduce artifacts (see Figure 5h,j). The NSST method makes the fused image brighter, and it is not conducive to the acquisition of target information from the fused image (see Figure 5k). The proposed fusion method provides a better fusion effect (see Figure 5l).
From Figure 6, it can be seen that the GFF, IFM, DTCWT, and CVT algorithms decrease the contrast and make the images darker (see Figure 6c-f). The PC technique appears to generate a block effect (see Figure 6g). The SAIF, MISF, and NSST methods produce artifacts, and the brightness is over-enhanced in some regions (see Figure 6h,j,k). The DRTV method produces over-enhanced brightness in some regions and an overly smooth fusion image (see Figure 6i). The proposed algorithm can enhance the contrast and definition, which is helpful in obtaining the target information from the fused image (see Figure 6l).
From Figure 7, we can see that the GFF, IFM, SAIF, and MISF algorithms make the fusion image darker (see Figure 7c,d,h,j). The DTCWT and CVT methods produce a good fusion visual effect (see Figure 7e,f). The PC, DRTV, and NSST techniques produce distortion and artifacts (see Figure 7g,i,k). The proposed fusion technique can produce relatively higher contrast and preserve the texture information (see Figure 7l).
In summary, the analysis of the subjective assessment of the fusion results demonstrates the super-performance of the proposed remote sensing image fusion technique when compared with the state-of-the-art fusion algorithms.

Quantitative Analysis
In this section, the six indexes (VIFF, Q S , AG, CC, SF, Q W ) are used to evaluate the fusion results quantitatively. The data for the evaluation metrics of the different fusion algorithms for Figures 4-7 are shown in Tables 1-4. From Table 1, we can see that the value of VIFF as computed by the proposed method is slightly worse than the NSST algorithm, while the data for the other five metrics as calculated by the proposed fusion technique are the best. From Table 2, we can see that the metric values given by the proposed method are the largest except for the metric of CC. From Table 3, the values of CC and Q W as computed by the proposed technique are a little smaller than the corresponding values obtained by the CVT and NSST methods, respectively. From Table 4, we can see that all six values of the metrics achieved by the proposed method are the best compared to the other fusion methods.
In order to demonstrate the effectiveness of the proposed method, the sixteen image groups given in Figure 3 are simulated, and the average values of their objective evaluation are given in Table 5. The line charts of the objective metrics data in Table 5 are given in Figure 8, and the proposed method has the best values in the data for all metrics. Therefore, it is demonstrated that better fusion performance can be generated by the proposed remote sensing image fusion work. tion are given in Table 5. The line charts of the objective metrics data in Table 5 are given in Figure 8, and the proposed method has the best values in the data for all metrics. Therefore, it is demonstrated that better fusion performance can be generated by the proposed remote sensing image fusion work.

Conclusions
In this work, a novel saliency-guided nonsubsampled shearlet transform for multisource remote sensing image fusion is introduced. First, the input images are transformed from the spatial domain to the shearlet domain according to a nonsubsampled shearlet transform. Second, the contrast saliency maps and corresponding weighted matrices are introduced for fusing the low-frequency coefficients, and the SML-based fusion rule is performed on the high-frequency coefficients, which can improve the contrast and definition of the fused images. To prove the universality of the proposed fusion algorithm, sixteen sets of remote sensing images are simulated, and six image fusion evaluation indexes are utilized for the quantitative analysis. From the experimental results, we can conclude that the proposed fusion approach has superior performance compared to the state-ofthe-art fusion methods. In future work, we will extend the algorithm to panchromatic and

Conclusions
In this work, a novel saliency-guided nonsubsampled shearlet transform for multisource remote sensing image fusion is introduced. First, the input images are transformed from the spatial domain to the shearlet domain according to a nonsubsampled shearlet transform. Second, the contrast saliency maps and corresponding weighted matrices are introduced for fusing the low-frequency coefficients, and the SML-based fusion rule is performed on the high-frequency coefficients, which can improve the contrast and definition of the fused images. To prove the universality of the proposed fusion algorithm, sixteen sets of remote sensing images are simulated, and six image fusion evaluation indexes are utilized for the quantitative analysis. From the experimental results, we can conclude that the proposed fusion approach has superior performance compared to the state-of-theart fusion methods. In future work, we will extend the algorithm to panchromatic and multispectral [43][44][45][46][47][48], hyperspectral and multispectral image fusion [49,50].