A Novel Dual-Branch Pansharpening Network with High-Frequency Component Enhancement and Multi-Scale Skip Connection

Huang, Wei; Liu, Yanyan; Sun, Le; Chen, Qiqiang; Gao, Lu

doi:10.3390/rs17050776

Open AccessArticle

A Novel Dual-Branch Pansharpening Network with High-Frequency Component Enhancement and Multi-Scale Skip Connection

by

Wei Huang

^1,*

,

Yanyan Liu

¹,

Le Sun

²

,

Qiqiang Chen

¹ and

Lu Gao

¹

College of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450002, China

²

School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(5), 776; https://doi.org/10.3390/rs17050776

Submission received: 13 January 2025 / Revised: 12 February 2025 / Accepted: 21 February 2025 / Published: 23 February 2025

Download

Browse Figures

Versions Notes

Abstract

In recent years, the pansharpening methods based on deep learning show great advantages. However, these methods are still inadequate in considering the differences and correlations between multispectral (MS) and panchromatic (PAN) images. In response to the issue, we propose a novel dual-branch pansharpening network with high-frequency component enhancement and a multi-scale skip connection. First, to enhance the correlations, the high-frequency branch consists of the high-frequency component enhancement module (HFCEM), which effectively enhances the high-frequency components through the multi-scale block (MSB), thereby obtaining the corresponding high-frequency weights to accurately capture the high-frequency information in MS and PAN images. Second, to address the differences, the low-frequency branch consists of the multi-scale skip connection module (MSSCM), which comprehensively captures the multi-scale features from coarse to fine through multi-scale convolution, and it effectively fuses these multilevel features through the designed skip connection mechanism to fully extract the low-frequency information from MS and PAN images. Finally, the qualitative and quantitative experiments are performed on the GaoFen-2, QuickBird, and WorldView-3 datasets. The results show that the proposed method outperforms the state-of-the-art pansharpening methods.

Keywords:

deep learning; pansharpening; high-frequency component enhancement module (HFCEM); multi-scale skip connection module (MSSCM); dual-branch

Graphical Abstract

1. Introduction

Abundant spectral and spatial information is contained within high-resolution multispectral (HRMS) images. They can be employed in diverse practical applications, such as environmental detection, urban planning, and other fields [1,2,3]. However, with the limitations of satellite sensors, it is difficult to obtain images with both high spectral resolution and high spatial resolution from a single sensor. An effective approach to solving the problem is through pansharpening, which uses some algorithms to fuse multispectral (MS) and panchromatic (PAN) images to obtain HRMS images. The images exhibit superior spatial resolution and spectral fidelity [4], and the technique is a preprocessing step for images [5].

Over the years, numerous approaches have been introduced in pansharpening, which are usually classified into four main classes: component substitution (CS), multiresolution analysis (MRA), variational optimization (VO), and deep learning (DL).

The CS-based method is to decompose the MS images into spectral and spatial constituents in the domain of transformation, then replace the spatial components of MS images with the PAN image. Then, one obtains the HRMS images by inverse transformation. Representative methods include intensity hue saturation (IHS) [6], Gram–Schmidt (GS) [7], principal component analysis (PCA) [8], and partial replacement adaptive CS (PRACS) [9]. Although the principle of the CS-based approach is simple and has high spatial fidelity, the process of information replacement destroys the structure of the original MS information, which may result in spectral distortion.

The core concept behind the MRA methods revolves around extracting spatial information from a PAN image, followed by integrating this information with the MS images to generate the HRMS images. Representative pansharpening methods for MRA are Laplacian pyramid [10], Wavelet [11], SFIM [12], HPF [13], and Nonsubsampled contourlet [14]. Vivone et al. [15] introduced a regression-based method for pansharpening with full-scale injection coefficients. Alparone et al. [16] introduced an improved additive wavelet luminance scaling method, which achieved fast and high performance. Both MRA-based and CS-based approaches have easy realization, and good spectral information can be obtained to some extent based on the MRA method. But in the case of spatial information injection, it causes a serious loss of spatial information.

VO-based methods predominantly hinge on solving optimization problems, which are modeled using a priori knowledge, and the pansharpening results are obtained by an efficient solving algorithm. It mainly includes sparse [17,18], and model-based representation [19,20]. Zhang et al. [21] introduced a convolution sparse representation technique by representing the PAN and MS images as linear combinations of sparse coefficients and dictionaries, respectively, and then fusing the two by a convolution operation. The VO-based method obtains better fusion results than the previous two methods, but it requires more time and an a priori model that lacks preserved spatial information, so the generated images have very obvious spatial distortions.

Over thee past several years, due to benefits from the strong feature extraction and nonlinear mapping abilities of DL, many DL-based pansharpening methods have emerged. Huang et al. [22] first applied DL in pansharpening and proposed a self-coding pansharpening method that relies on sparse noise, achieving a more desirable fusion effect. Drawing inspiration from SRCNN [23], Masi et al. [24] presented the PNN network, consisting of three layers of CNNs for pansharpening; the proposed method exhibits notable performance gains over traditional approaches. Then, Yang et al. [25] presented PanNet, which introduces the idea of skip connections and trains the network within the high-pass domains. The performance is significantly improved by integrating the learned spatial information into the MS images. Yuan et al. [26] proposed MSDCNN, which improves the model performance by using multi-scale convolution kernels to process both deep and shallow information. Deng et al. [27] presented FusionNet by combining the traditional method with CNN, which fully extracts the spatial information of the images using a difference strategy. A triple-double network was introduced by Zhang et al. [28]. This structure gradually incorporates spatial details from the PAN image into the MS images, resulting in a high spatial resolution output. Tu et al. [29] proposed MMDN, which extracts spatial information on different scales through multi-distillation residual information blocks and multi-scale dilation blocks, and further improves the ability to capture detailed information by extracting residual features. Lei et al. [30] proposed a multi-scale hierarchical pansharpening network featuring adaptive optimization, which adequately captures multi-scale features by grouping dilated blocks and fuses multilevel hierarchical information for feature enhancement.

Cheng et al. [31] proposed DMFANet, which performs multilevel feature fusion within the network architecture that maximizes the utilization of features at different levels. Jian et al. [32] proposed the MMFN network for pansharpening, which fully extracts spectral and spatial information at different levels by combining multi-scale and multi-stream structures. Lu et al. [33] proposed the AWFLN network, which fully extracts detail information by embedding a multi-scale convolution module, and the accuracy of detail extraction is improved by adding feature learning blocks to adaptively learn the feature weights. Lu et al. [34] proposed SEWformer, which utilizes a cross-scale interactive encoder and spatially spectrally enhanced window transformer to fully extract detail information. Zhao et al. [35] presented an adaptive frequency-adjusted PRNet to address the spatial and frequency differences between MS and PAN images, capturing contextual information by fusing multi-scale frequency domain features.

In order to generate high-quality images, we propose a novel dual-branch pansharpening network with high-frequency component enhancement and a multi-scale skip connection. We design the high-frequency branch consisting of the high-frequency component enhancement module (HFCEM) to acquire high-frequency features from the MS and PAN images. We also design the low-frequency branch consisting of the multi-scale skip connection module (MSSCM) to extract the low-frequency features. Through two independent and synergistic branches, it can achieve accurate extraction and fusion of image information. Better fusion results are obtained at both reduced- and full-scale experiments.

The main contributions of the paper are as follows:

We propose a novel dual-branch fusion network. While fully extracting high- and low-frequency information, it solves the problem that traditional networks ignore the differences and correlations between PAN and MS images.
In the high-frequency branch, we designed HFCEM. Multi-scale block (MSB) processing is performed on the extracted high-frequency component to generate weights and fully extract high-frequency information. The proposed HFCEM is able to effectively utilize the correlations between images and exhibits excellent performance in recovering texture details.
In the low-frequency branch, we further designed MSSCM. This module is able to effectively capture features at different scales by combining multi-scale convolution and skip connections, and reuse shallow information features to better extract the overall contour information.

2. Related Work

Although the above structure is able to achieve good results, it is still deficient in considering the differences and correlations of the images. The PAN and MS images possess complementary and redundant information, so constructing efficient DL models will be critical for pansharpening. Figure 1 illustrates the DL framework for pansharpening. For the Figure 1a method, using existing methods [25,33,34], MS images are directly regarded as low-frequency information, while the structural information within the image is disregarded, leading to spectral distortion. For the Figure 1b method, using existing methods [24,26], which learn features directly from the spliced PAN and MS, the differences between the two images are ignored, resulting in unspecified learned features. For the Figure 1c method, using existing method [31], which uses two independent feature extraction branches, it ignores the correlations between different images, resulting in insufficient extracted information. In summary, the core challenge we face is how to fully and accurately extract image features while also fully considering the unique differences and correlations between different images.

In response to the above problems, we present a novel framework, as shown in Figure 1d. The high-frequency and low-frequency information from MS and PAN images are obtained using module 1 and module 2 in two branches, respectively. The two branches can fully extract the feature information of the images and take into account the differences and correlations between the images.

3. Proposed Method

The overall framework of the model is shown in Figure 2. Firstly, when extracting high-frequency information, the method is no longer limited to obtaining it from PAN images only but combines MS images, effectively avoiding the problem of insufficient feature extraction that may result from a single source. At the same time, we no longer directly regard MS as low-frequency information directly, avoiding the blurring of information caused by ignoring inter-image differences.

Subsequently, different information is extracted through these two branches. The high-frequency branch comprises two layers of HFCEM, which are specifically designed to capture detailed information. The low-frequency branch consists of two layers of MSSCM, aimed at extracting information at low frequencies. Finally, after two levels of feature extraction, the reconstruction of the HRMS images is achieved by fusing the different information obtained from the two branches.

It is particularly important to perform data preprocessing prior to feature extraction, which provides rich and detailed information for subsequent feature extraction. For the convenience of modeling, we define the PAN image as P and the MS images as M. The image preprocessing module consists of three separate branches,

D_{1}

,

D_{2}

, and

D_{3}

. The specific formula is expressed as follows.

D_{2} = C o n c a t (P, D_{1})

(1)

D_{3} = P^{D} - D_{1}

(2)

where

D_{1}

is upsampled four times using a 23-tap polynomial,

P^{D}

denotes the PAN image replicated along the spectral dimension, and

C o n c a t (\cdot)

refers to the concatenation function.

3.1. HFCEM

The PAN and MS images contain high-frequency information, but extracting this information adequately is a challenge. Inspired by [27,36], there is still room for optimization in extracting high-frequency information. As shown in Figure 3, we propose an improved structure for extracting high-frequency information.

The input to this structure includes

D_{3}

and

D_{1}

, where

D_{3}

was used as the high-frequency components. Subsequently, the MSBs consisting of 3 × 3, 5 × 5, and 7 × 7 are utilized to further obtain the weights of the high-frequency information. Next, these weights are multiplied with

D_{1}

pixel by pixel to obtain a richly detailed output image

H_{1}

. In the high-frequency branch, a two-layer HFCEM is used to enhance the extraction of image features. With the above structure, the high-frequency information of the image is fully extracted.

\begin{matrix} S_{1} & = C o n v_{4} (C o n c a t (C o n v_{1} (D_{3}), C o n v_{2} (D_{3}), C o n v_{3} (D_{3}))) \\ H_{1} & = S_{1} ⨂ D_{2} \end{matrix}

(3)

where

C o n v_{1}

,

C o n v_{2}

,

C o n v_{3}

, and

C o n v_{4}

represent the sizes of the convolution kernels, which are 3, 5, 7, and 1. ⨂ is a pixel-by-pixel multiplication operation.

H_{2} = H F C E M (H_{1}, D_{3})

(4)

where

H_{2}

represents the output of the second HFCEM layer.

3.2. MSSCM

Due to the differences in information between MS and PAN images, directly treating MS images as low-frequency information to be summed with the high-frequency information causes their unique information to be ignored and the fusion result suffers from spectral distortion. To solve this issue and to take full advantage of the complementary strengths between the two, we use multi-scale skip connection extraction to capture low-frequency information. In Figure 4a, this structure can extract deeper features but does not fully utilize the information between the different levels. In Figure 4b, a multi-scale structure for feature extraction, the structure can capture more detailed information from the image, but feature extraction for low-frequency information is still lacking.

Combining the above two structures, we propose MSSCM, which aims to comprehensively capture the low-frequency information of images. As in Figure 4c, the module employs three convolution kernels of decreasing sizes in series to effectively extract the low-frequency feature information at different scale levels. The deep fusion of multilayer features is achieved by superimposing the convolution results at all levels through the Concat operation. In addition, it reduces low-frequency information loss in the convolution process by the skip connection mechanism.

First,

D_{1}

is subjected to feature extraction using convolution kernels of 7 × 7, 5 × 5, and 3 × 3 sizes. Subsequently, the results after each convolution are concatenated to reuse shallow features and fuse multilayer features. Finally, the 1 × 1 convolution is used to adjust the number of channels, while reducing the parameters in the model, and

M_{1}

is obtained by skip connections, which effectively mitigates the loss of information due to the increased depth of the network.

C_{i} = \{\begin{matrix} D_{1}, i = 0 \\ R e l u (C o n v_{j} (C_{i - 1})), i = 1, 2, 3 \end{matrix}

(5)

M_{1} = R e l u (C o n v (C o n c a t (C_{1}, C_{2}, C_{3}))) + C_{0}

(6)

where

C o n v_{j}

represents the size of the convolution kernel, which takes the values of 7, 5, and 3.

C_{i}

indicates the output of the corresponding convolution operation.

In order to fully consider and utilize the potential correlations between the branches and to extract the low-frequency image information more efficiently, we sum the outputs of the first level of the HFCEM and MSSCM branches. The added results are used as the input of the second level of MSSCM, which enhances the richness of information. After two levels of feature extraction, the fused-images HRMS is finally obtained by fusing the features of the two branches.

M_{2} = M S S C M (M_{1} + H_{1})

(7)

H R M S = D_{1} + C o n v (R e l u (C o n v (H_{2} + M_{2})))

(8)

3.3. Loss Function

Following the network architecture design, we use the mean squared error (mse) as the loss function, and the following equation for the loss function:

L o s s = \frac{1}{N} \sum_{j = 1}^{N} {‖ G T_{j} - H_{N e t} (M_{j}, P_{j}) ‖}^{2}

(9)

where j represents the j-th training sample, and N represents the total number of training.

H_{N e t}

represents the network structure of this paper.

G T

represents the ground truth (GT) images.

4. Experiments and Results

This section outlines the datasets for experimentation, the comparison methods, the evaluation index, and the parameter configurations for the conducted tests. The presented method is compared with other approaches by using three datasets. Finally, the efficacy of each introduced module is confirmed through ablation experiments.

4.1. Experimental Design

This section presents an overview of the dataset characteristics and details; we used the published dataset PanCollection [37] for training and testing. We used GaoFen-2, QuickBird, and WorldView-3 datasets to verify the robustness of the network. The MS images of the GaoFen-2 and QuickBird datasets include four bands, and the MS images of WorldView-3 contain eight bands. Table 1 lists information about the datasets, including the spatial resolution of the remote sensing images, and the number of images in the training and validation sets. Figure 5 presents the MS and PAN images for use in the experiment. Experiments are performed on the three datasets using reduced- and full-scale experiments.

4.2. Methods of Comparison

We chose ten methods for the experiments in the three datasets introduced above, and the comparison methods included GS [7], PRACS [9], Wavelet [11], HPF [13], PNN [24], MSDCNN [26], FusionNet [27], TDNet [28], AWFLN [33], SEWformer [34], and PRNet [35].

The traditional methods are implemented on MATLAB 2018b. All DL-based methods methods are implemented using Python 3.6 with PyTorch 1.9.0 on the PC side using NVIDIA GeForce RTX 3060 GPUs. The training parameters of the DL-based pansharpening method are as follows: the batch size is 16, the epochs are set to 400, the initial value is 0.0003, and the decay is once every 100 epochs, with a decay rate of 0.8.

4.3. Evaluation Indicators

To objectively evaluate the experimental results, we adopted two strategies to quantitatively assess the fusion results. In this study, the reduced-scale experiment reference metrics include the spectral angle mapper (SAM) [38], the relative global synthesis error (ERGAS) [39], the correlation coefficient (CC) [40], the universal image quality index (Q) [41], and the extended version of Q (Q2n) [42].

SAM represents the angle between the corresponding pixels of the HRMS and GT images, which is used to assess their similarity. A lower SAM value signifies reduced spectral distortion, thus the best value is 0. ERGAS is used to assess the spectral retention of the image, reflecting the situation of spectral distortion. A smaller value of ERGAS indicates a smaller spectral loss, ideally 0. CC measures the similarity between the two images and indicates their geometric distortion, and a higher CC value signifies a better fusion effect. Q is used to evaluate quality in a variety of imaging applications, and Q2n is a widely used quality evaluation metric in generalization processing. The optimal values for Q and Q2n are 1.

The quantitative criterion for the full-scale experiment is the reference-free quality index (QNR) [43],

D_{λ}

, and

D_{s}

.

D_{λ}

is the spectral quality index that evaluates the image’s spectral quality.

D_{s}

is used for measuring spatial distortion. The closer these two metrics are to 0, the higher the quality of the fused image. QNR assesses the overall generalization capability of the generated images; the optimal value is 1. The indicator values of the quantitative results are the average values of the experiments.

4.4. Reduced-Scale Experiments

We designed reduced-scale experiments to validate the superiority of our proposed method on the GaoFen-2, QuickBird, and WorldView-3 datasets, with MS and PAN images of 16 × 16 and 64 × 64 sizes. In the quantitative results, we bold the best value of each metric to better observe the method corresponding to the best value. In the qualitative results, to clearly exhibit the details within the fusion images, a part of the fusion image is boxed in red and enlarged three times in this paper. The error maps show how close the fused image is to the GT images, where the more blue areas, the better the fusion result.

Table 2 shows the quantitative results on the GaoFen-2 dataset. According to the table, the GS method has the highest SAM and ERGAS values, while PRACS and HPF methods showed better results in traditional methods. The DL-based methods exhibit notable advantages in both SAM and ERGAS and CC, Q, and Q2n. In contrast, PNN and MSDCNN fall short in pansharpening, probably due to the relative simplicity of these models producing results where the extracted features are not clear. Although AWFLN, SEWformer, and PRNet perform better than the above three methods, our proposed network better preserves spatial and spectral information.

Figure 6 shows the pseudo-color map of the GaoFen-2 dataset. We chose buildings as local details, and the figure shows that the GS method exhibits noticeable spectral distortions, such as the color of the field. The Wavelet method shows spatial aberration, especially for roads and buildings. The result of the method based on DL-based methods is much clearer, and the local zoom does not show obvious blurring. As can be seen in the error maps in Figure 7, PNN, MSDCNN, FusionNet, and TDNet have large errors with GT, and the spatial details are not well recovered. The SEWformer method and our method have smaller errors. Our proposed method fully considers the differences and correlations between images, and attains the best results in that the detail information is well handled and closest to the GT images.

Table 3 shows the index values of various methods on the QuickBird dataset. Among the traditional methods, SAM and ERGAS based on GS and Wavelet methods are significantly weaker than other methods, with weaker spatial and spectral fidelity. The metric values of the DL-based methods do not differ much, but the metric values of our proposed methods are superior to other compared methods.

Figure 8 and Figure 9 show the visualization results and error maps for the QuickBird dataset. The Wavelet method has the least satisfactory fusion results, with obvious spectral aberrations. In contrast, the DL-based methods do not show significant spectral distortion. In Figure 9, PRNet and the proposed method have fewer residuals, but the PRNet method still needs to improve in spatial detail enhancement. Overall, the proposed method is excellent at preserving both spatial and spectral information.

The results of the quantitative evaluation for the WorldView-3 dataset are shown in Table 4. It can be seen that the Wavelet method has the worst metric value and the DL-method achieves better performance than the traditional method. Although the ERGAS value of the proposed method is second only to the PRNet method, in terms of the overall index, our proposed method performs the best. In Figure 10, the fused images of the PRACS and HPF methods are blurred; the Wavelet method has spectral distortion. The results of FusionNet and TDNet show insufficient extraction of detailed information. In contrast, PRNet and the proposed method are closer with GT images. As shown in Figure 11, we further compare the variability of the methods using error maps. Overall, our proposed method performs best in both qualitative and quantitative results.

4.5. Full-Scale Experiments

In this section, full-scale experiments are performed on the GaoFen-2, QuickBird, and WorldView-3 datasets, and the sizes of MS and PAN images are 128 × 128 and 512 × 512. Table 5 shows the values of the GaoFen-2, QuickBird, and WorldView-3 datasets at full-scale non-reference metrics. The GS and Wavelet methods have worse QNR metrics, and the DL-based approach performs better in terms of overall performance metrics. For the QuickBird dataset, the proposed performs in QNR and

D_{λ}

, with the second-best results in

D_{s}

. The proposed method produces better fused results on all three datasets compared to other methods.

Figure 12, Figure 13 and Figure 14 show the fusion images of the three datasets for the full-scale experiments. The fused images of the traditional methods still have deficiencies in spectral and spatial information retention. As shown in Figure 12, the HPF method suffers from spatial blurring and the Wavelet method shows white artifacts. In Figure 13, the AWFLN method is poor in details, such as vehicles on the road. As shown in the enlarged box, the proposed methods perform better in spectral and spatial details. In Figure 14, the Wavelet and HPF methods show spectral distortion, and SEWformer, PRNet, and our proposed method have higher fused image resolution. Due to the lack of GT images, it is difficult to distinguish, as can be seen from Table 5, that our method outperforms the other two methods.

4.6. Ablation Experiments

We performed ablation experiments using the GaoFen-2 dataset to assess the contribution of each network module. Table 6 presents the quantitative metrics, while Figure 15 shows the results of the ablation model along with corresponding error maps. To emphasize the fusion details, we have framed several fused images and tripled their content. Ablation experiment 1 involved removing the HFCEM for feature extraction. Ablation experiment 2 replaces the MSSCM with the (b) structural multi-scale module in Figure 4. Ablation experiment 3 employed a single layer of MSSCM combined with HFCEM.

Upon comparing Table 6 and Figure 15, it is evident that removing HFCEM has the worst results for all metric values. The white buildings depicted in both fusion and error maps appear particularly blurred, underscoring HFCEM’s critical role within the network. In Experiment 2, Table 6 shows that the addition of MSSCM reduces the SAM by 0.12, indicating the validity of the proposed MSSCM.

To assess the effectiveness of the double layer structure, a single layer of HFCEM and MSSCM was used in Experiment 3. Overall comparisons showed a decrease in the values of the various category metrics. Analyses of the error maps show that improvements in detail are still lacking. In addition, we conducted experiments to study the effect of different network layers. Figure 16 and Figure 17 illustrate the number of parameters and SAM values for models with layers 1 to 6. The figures show that the model’s parameter count consistently increases as the number of layers grows. But when the number of layers is greater than 2, the change in the corresponding parameter value SAM tends to slow down. Combining the changes in the index values and the number of parameters, we chose to adopt a double-layer network as the final model.

5. Discussion

First, in Section 4.4, we performed reduced-scale experiments on the GaoFen-2, QuickBird, and WorldView-3 datasets. We outline comparisons with ten other methods through qualitative and quantitative analyses, and the best values obtained from the fused results are bolded in the quantitative assessment. We frame and triple magnify the local details in order to observe the fusion image more clearly. In addition, we use error maps to show the difference between the fused image and the GT images. From Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11, the DL-based method has high-quality fusion results in all three datasets, and the metrics values in Table 2, Table 3 and Table 4 verify its merits. In both qualitative and quantitative results, the proposed method has the best overall metrics values and produces the fused image closest to the GT images, which further proves the good performance of our proposed method.

In Section 4.5, we perform full-scale experiments. As illustrated in Figure 12, Figure 13 and Figure 14, the fused images of the traditional methods are blurry, and the results of the Wavelet method are the most obvious. The AWFLN, SEWformer, and PRNet methods retain spatial details better compared to PNN and MSDCNN, but our method has better spectral retention and spatial detail enhancement than other DL-based methods. To verify the performance of each module, we performed ablation experiments in Section 4.6. From Figure 15 and Table 6, it can be seen that the use of MSSCM and HFCEM alone is effective for the fusion results, but the simultaneous use of these two modules leads to the best fusion results.

6. Conclusions

We propose a dual-branch network with high-frequency enhancement and a multi-scale skip connection for pansharpening, to address the issue of ignoring the differences and correlations between the traditional branch images. We used two different branch structures to extract various types of information in the images. Firstly, we design HFCEM to fully extract the high-frequency information from the images. Second, we further design MSSCM to extract low-frequency information. Through the above structure, the image correlations are enhanced and the distortion caused by the difference in image information is effectively alleviated. Finally, by comparing with ten different methods on three datasets, GaoFen-2, QuickBird, and WorldView-3, the proposed method demonstrates superior performance in reduced- and full-scale experiments.

Author Contributions

Conceptualization, W.H. and Y.L.; methodology, W.H. and Y.L.; software, Y.L. and L.S.; validation, L.S., Q.C. and L.G.; writing—original draft preparation, W.H. and Y.L.; writing—review and editing, Q.C. and L.G. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China, grant numbers 62471239.

Data Availability Statement

The link to the dataset used in the article is as follows: https://liangjiandeng.github.io/PanCollection.html (accessed on 12 January 2025).

Acknowledgments

We are grateful to the editors and reviewers for their advice.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gilbertson, J.K.; Kemp, J.; Niekerk, A.V. Effect of pan-sharpening multi-temporal Landsat 8 imagery for crop type differentiation using different classification techniques. Comput. Electron. Agric. 2017, 134, 151–159. [Google Scholar] [CrossRef]
Ye, Q.; Li, Z.; Fu, L.; Zhang, Z.; Yang, W. Nonpeaked Discriminant Analysis for Data Representation. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3818–3832. [Google Scholar] [CrossRef] [PubMed]
Li, K.; Xie, W.; Du, Q.; Li, Y. DDLPS: Detail-Based Deep Laplacian Pansharpening for Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8011–8025. [Google Scholar] [CrossRef]
Dadrass Javan, F.; Samadzadegan, F.; Mehravar, S.; Toosi, A.; Khatami, R.; Stein, A. A review of image fusion techniques for pan-sharpening of high-resolution satellite imagery. ISPRS J. Photogramm. Remote Sens. 2021, 171, 101–117. [Google Scholar] [CrossRef]
Lu, H.; Yang, Y.; Huang, S.; Tu, W. An Efficient Pansharpening Approach Based on Texture Correction and Detail Refinement. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Zhou, X.; Liu, J.; Liu, S.; Cao, L.; Zhou, Q.; Huang, H. A GIHS-based spectral preservation fusion method for remote sensing images using edge restored spectral modulation. ISPRS J. Photogramm. Remote Sens. 2014, 88, 16–27. [Google Scholar] [CrossRef]
Ghassemian, H. A review of remote sensing image fusion methods. Inf. Fusion 2016, 32, 75–89. [Google Scholar] [CrossRef]
Choi, J.; Yu, K.; Kim, Y. A New Adaptive Component-Substitution-Based Satellite Image Fusion by Using Partial Replacement. IEEE Trans. Geosci. Remote Sens. 2011, 49, 295–309. [Google Scholar] [CrossRef]
Chavez, P.S.; Kwarteng, A.Y. Extracting spectral contrast in landsat thematic mapper image data using selective principal component analysis. Photogramm. Eng. Remote Sens. 1989, 55, 339–348. [Google Scholar]
Vivone, G.; Maranò, S.; Chanussot, J. Pansharpening: Context-Based Generalized Laplacian Pyramids by Robust Regression. IEEE Trans. Geosci. Remote Sens. 2020, 58, 6152–6167. [Google Scholar] [CrossRef]
Zhou, J.T.; Civco, D.L.; Silander, J.A. A wavelet transform method to merge Landsat TM and SPOT panchromatic data. Int. J. Remote Sens. 1998, 19, 743–757. [Google Scholar] [CrossRef]
Liu, J.; Basaeed, E. Smoothing Filter-based Intensity Modulation: A spectral preserve image fusion technique for improving spatial details. Int. J. Remote Sens. 2000, 21, 3461–3472. [Google Scholar] [CrossRef]
Vivone, G.; Alparone, L.; Chanussot, J. A Critical Comparison Among Pansharpening Algorithms. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2565–2586. [Google Scholar] [CrossRef]
Cunha, A.L.; Zhou, J.; Do, M.N. The Nonsubsampled Contourlet Transform: Theory, Design, and Applications. IEEE Trans. Image Process. 2006, 15, 3089–3101. [Google Scholar] [CrossRef] [PubMed]
Vivone, G.; Restaino, R.; Chanussot, J. Full Scale Regression-Based Injection Coefficients for Panchromatic Sharpening. IEEE Trans. Image Process. 2018, 27, 3418–3431. [Google Scholar] [CrossRef] [PubMed]
Vivone, G.; Alparone, L.; Garzelli, A.; Lolli, S. Fast Reproducible Pansharpening Based on Instrument and Acquisition Modeling: AWLP Revisited. Remote Sens. 2019, 11, 2315. [Google Scholar] [CrossRef]
Ballester, C.; Caselles, V.; Igual, L.; Verdera, J.; Rougé, B. A Variational Model for P+XS Image Fusion. Int. J. Comput. Vis. 2006, 69, 43–58. [Google Scholar] [CrossRef]
Yang, Y.; Wu, L.; Huang, S.; Sun, J.; Wan, W.; Wu, J. Compensation Details-Based Injection Model for Remote Sensing Image Fusion. IEEE Geosci. Remote Sens. Lett. 2018, 15, 734–738. [Google Scholar] [CrossRef]
Fang, F.; Li, F.; Shen, C.; Zhang, G. A Variational Approach for Pan-Sharpening. IEEE Trans. Image Process. 2013, 22, 2822–2834. [Google Scholar] [CrossRef] [PubMed]
Liu, P.; Xiao, L.; Zhang, J.; Naz, B. Spatial-Hessian-Feature-Guided Variational Model for Pan-Sharpening. IEEE Trans. Geosci. Remote Sens. 2016, 54, 2235–2253. [Google Scholar] [CrossRef]
Zhang, K.; Wang, M.; Yang, S.; Jiao, L. Convolution Structure Sparse Coding for Fusion of Panchromatic and Multispectral Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 1117–1130. [Google Scholar] [CrossRef]
Huang, W.; Xiao, L.; Wei, Z.; Liu, H.; Tang, S. A New Pan-Sharpening Method With Deep Neural Networks. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1037–1041. [Google Scholar] [CrossRef]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
Masi, G.; Cozzolino, D.; Verdoliva, L. Pansharpening by Convolutional Neural Networks. Remote Sens. 2016, 8, 594. [Google Scholar] [CrossRef]
Yang, J.; Fu, X.; Hu, Y.; Huang, Y.; Ding, X.; Paisley, J.W. PanNet: A Deep Network Architecture for Pan-Sharpening. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1753–1761. [Google Scholar]
Yuan, Q.; Wei, Y.; Meng, X.; Shen, H.; Zhang, L. A Multiscale and Multidepth Convolutional Neural Network for Remote Sensing Imagery Pan-Sharpening. IEEE J. Sel Top. Appl. Earth Observ. Remote Sens. 2017, 11, 978–989. [Google Scholar] [CrossRef]
Deng, L.; Vivone, G.; Jin, C.; Chanussot, J. Detail Injection-Based Deep Convolutional Neural Networks for Pansharpening. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6995–7010. [Google Scholar] [CrossRef]
Zhang, T.; Deng, L.; Huang, T.; Chanussot, J.; Vivone, G. A Triple-Double Convolutional Neural Network for Panchromatic Sharpening. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 9088–9101. [Google Scholar] [CrossRef] [PubMed]
Tu, W.; Yang, Y.; Huang, S.; Wan, W.; Gan, L. MMDN: Multi-Scale and Multi-Distillation Dilated Network for Pansharpening. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
Lei, D.; Huang, J.; Zhang, L.; Li, W. MHANet: A Multiscale Hierarchical Pansharpening Method With Adaptive Optimization. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
Cheng, G.; Shao, Z.; Wang, J.; Huang, X.; Dang, C. Dual-Branch Multi-Level Feature Aggregation Network for Pansharpening. IEEE CAA J. Autom. Sinica. 2022, 9, 2023–2026. [Google Scholar] [CrossRef]
Jian, L.; Wu, S.; Chen, L.; Vivone, G.; Rayhana, R.; Zhang, D. Multi-Scale and Multi-Stream Fusion Network for Pansharpening. Remote Sens. 2023, 15, 1666. [Google Scholar] [CrossRef]
Lu, H.; Yang, Y.; Huang, S.; Chen, X. AWFLN: An Adaptive Weighted Feature Learning Network for Pansharpening. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
Lu, H.; Guo, H.; Liu, R.; Xu, L.; Wan, W.; Tu, W.; Yang, Y. Cross-Scale Interaction With Spatial-Spectral Enhanced Window Attention for Pansharpening. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2024, 17, 11521–11535. [Google Scholar] [CrossRef]
Zhao, X.; Zhang, Y.; Guo, J.; Zhu, Y.; Zhou, G.; Zhang, W.; Wu, Y. Progressive Reconstruction Network With Adaptive Frequency Adjustment for Pansharpening. IEEE J. Sel Top. Appl. Earth Observ. Remote Sens. 2024, 17, 17382–17397. [Google Scholar] [CrossRef]
Xiong, Z.; Liu, N.; Wang, N.; Sun, Z.; Li, W. Unsupervised Pansharpening Method Using Residual Network With Spatial Texture Attention. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–12. [Google Scholar] [CrossRef]
Deng, L.; Vivone, G.; Paoletti, M.E.; Scarpa, G.; He, J.; Zhang, Y. Machine Learning in Pansharpening: A benchmark, from shallow to deep networks. IEEE Geosci. Remote Sens Mag. 2022, 10, 279–315. [Google Scholar] [CrossRef]
Vivone, G.; Dalla Mura, M.; Garzelli, A.; Restaino, R. A New Benchmark Based on Recent Advances in Multispectral Pansharpening: Revisiting Pansharpening With Classical and Emerging Pansharpening Methods. IEEE Trans. Geosci. Remote Sens. Mag. 2021, 9, 53–81. [Google Scholar] [CrossRef]
Alparone, L.; Wald, L.; Chanussot, J.; Thomas, C.; Gamba, P.; Bruce, L.M. Comparison of Pansharpening Algorithms: Outcome of the 2006 GRS-S Data-Fusion Contest. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3012–3021. [Google Scholar] [CrossRef]
Liu, X.; Wang, Y.; Liu, Q. Remote Sensing Image Fusion Based on Two-Stream Fusion Network. Conference on Multimedia Modeling. Inf. Fusion 2020, 55, 1–15. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
Alparone, L.; Baronti, S.; Garzelli, A.; Nencini, F. A global quality measurement of pan-sharpened multispectral imagery. IEEE Geosci. Remote Sens. Lett. 2004, 1, 313–317. [Google Scholar] [CrossRef]
Alparone, L.; Aiazzi, B.; Baronti, S.; Garzelli, A. Multispectral and panchromatic data fusion assessment without reference. Photogramm. Eng. Remote Sens. 2008, 74, 193–200. [Google Scholar] [CrossRef]

Figure 1. DL-based pansharpening framework. (a) Treating MS images as low-frequency information. (b) Single-branch network. (c) Dual-branch network. (d) The proposed network structure.

Figure 2. The proposed overall network architecture.

Figure 3. HFCEM structure of the first layer.

Figure 4. (a) Residual connections. (b) Parallel multi-scale. (c) MSSCM structure of the first layer.

Figure 5. The MS and PAN images used for the experiments. In the three datasets, MS and PAN images at reduced-scale experiments are shown at the top, and MS and PAN images at full-scale experiments are shown at the bottom.

Figure 6. Pseudo-color map of the reduced-scale experiments on the GaoFen-2 dataset.

Figure 7. Error maps of reduced-scale experiments on the GaoFen-2 dataset.

Figure 8. Pseudo-color map of the reduced-scale experiments on the QuickBird dataset.

Figure 9. Error maps of reduced-scale experiments on the QuickBird dataset.

Figure 10. Pseudo-color map of the reduced-scale experiments on the WorldView-3 dataset.

Figure 11. Error maps of reduced-scale experiments on the WorldView-3 dataset.

Figure 12. Results of the GaoFen-2 dataset for full-scale experiments.

Figure 13. Results of the QuickBird dataset for full-scale experiments.

Figure 14. Results of the WorldView-3 dataset for full-scale experiments.

Figure 15. Visual results of the ablation experiment on the GaoFen-2 dataset: (a) ablation 1, (b) ablation 2, (c) ablation 3, (d) Proposed, (e) GT.

Figure 16. Total number of parameters for layers 1 to 6.

Figure 17. Average test result SAM for layers 1 to 6.

Table 1. Specific settings for the datasets.

Satellite	Band	Resolution (m)	Number of Training Images	Number of Validation Images
GaoFen-2	LRMS	4	19,809	2201
GaoFen-2	PAN	1	19,809	2201
QuickBird	LRMS	2.44	17,139	1905
QuickBird	PAN	0.61	17,139	1905
WorldView-3	LRMS	1.2	9714	1080
WorldView-3	PAN	0.3	9714	1080

Table 2. Quantitative evaluation based on the GaoFen-2 dataset.

Method	SAM	ERGAS	CC	Q	Q2n
Reference	0	0	1	1	1
GS	2.1482	2.4529	0.9693	0.9825	0.8469
PRACS	1.7781	1.7027	0.9859	0.9874	0.9236
Wavelet	1.9624	2.0984	0.9761	0.9796	0.8759
HPF	1.7716	1.7993	0.9843	0.9876	0.9135
PNN	1.1663	1.2583	0.9930	0.9941	0.9639
MSDCNN	1.0074	1.0629	0.9941	0.9955	0.9711
FusionNet	1.0439	1.1522	0.9933	0.9956	0.9658
TDNet	0.9170	0.8555	0.9964	0.9965	0.9790
AWFLN	0.8978	0.8191	0.9968	0.9966	0.9822
SEWformer	0.8775	0.8059	0.9966	0.9967	0.9823
PRNet	0.8881	0.8079	0.9969	0.9966	0.9824
Proposed	0.8653	0.7972	0.9969	0.9968	0.9823

Table 3. Quantitative assessment indicators for the QuickBird dataset.

Method	SAM	ERGAS	CC	Q	Q2n
Reference	0	0	1	1	1
GS	4.6296	7.0340	0.9393	0.9380	0.7617
PRACS	4.0177	6.5890	0.9456	0.9470	0.8194
Wavelet	4.8055	7.3663	0.9301	0.9294	0.7610
HPF	4.2084	6.6342	0.9454	0.9453	0.8074
PNN	3.9218	5.9846	0.9569	0.9569	0.8901
MSDCNN	3.7318	5.7833	0.9603	0.9601	0.8895
FusionNet	3.6742	5.7089	0.9585	0.9593	0.8864
TD	3.6608	5.7846	0.9569	0.9513	0.8501
AWFLN	3.6342	5.7399	0.9603	0.9601	0.8937
SEWformer	3.5948	5.5306	0.9619	0.9624	0.9005
PRNet	3.5750	5.7314	0.9621	0.9618	0.9011
Proposed	3.5110	5.3458	0.9626	0.9627	0.9012

Table 4. Quantitative assessment indicators for the WorldView-3 dataset.

Method	SAM	ERGAS	CC	Q	Q2n
Reference	0	0	1	1	1
GS	4.0384	3.8543	0.9553	0.9017	0.8584
PRACS	4.0031	3.4407	0.9579	0.9083	0.8812
Wavelet	5.281	4.4258	0.9321	0.8766	0.8361
HPF	3.9317	3.7153	0.9492	0.9111	0.8725
PNN	3.4077	3.7553	0.9427	0.9280	0.8881
MSDCNN	3.2809	3.8107	0.9430	0.9320	0.8858
FusionNet	3.2728	3.8245	0.9427	0.9321	0.8836
TD	3.3098	3.7089	0.9467	0.9337	0.8911
AWFLN	3.1754	3.3648	0.9572	0.9402	0.9111
SEWformer	3.2081	3.4954	0.9542	0.9370	0.9063
PRNet	3.2711	3.2866	0.9583	0.9385	0.9124
Proposed	3.1598	3.3370	0.9584	0.9408	0.9125

Table 5. Quantitative assessment indicators for full-scale experimental.

	GaoFen-2			QuickBird			WorldView-3
Method	QNR	$D_{λ}$	$D_{s}$	QNR	$D_{λ}$	$D_{s}$	QNR	$D_{λ}$	$D_{s}$
Reference	1	0	0	1	0	0	1	0	0
GS	0.8756	0.0683	0.0602	0.6439	0.0888	0.2934	0.8928	0.0159	0.0928
PRACS	0.8907	0.0355	0.0766	0.6837	0.0757	0.2603	0.8904	0.0259	0.0858
Wavelet	0.8699	0.0918	0.0422	0.6697	0.1491	0.2129	0.8586	0.0954	0.0509
HPF	0.9039	0.0468	0.0517	0.7424	0.0947	0.1799	0.9033	0.0303	0.0685
PNN	0.9825	0.0128	0.0048	0.8358	0.0508	0.1195	0.9436	0.0235	0.0337
MSDCNN	0.9814	0.0138	0.0049	0.8482	0.0703	0.0877	0.9545	0.0165	0.0295
FusionNet	0.9758	0.0159	0.0084	0.8818	0.0726	0.0492	0.9552	0.0114	0.0338
TDNet	0.9842	0.0102	0.0057	0.8877	0.0594	0.0562	0.9469	0.0232	0.0306
AWFLN	0.9823	0.0076	0.0102	0.8922	0.0578	0.0531	0.9559	0.0151	0.0295
SEWformer	0.9826	0.0126	0.0049	0.8967	0.0520	0.0541	0.9573	0.0119	0.0312
PRNet	0.9874	0.0074	0.0052	0.8983	0.0511	0.0533	0.9581	0.0131	0.0292
Proposed	0.9885	0.0071	0.0044	0.9070	0.0423	0.0529	0.9637	0.0087	0.0278

Table 6. Comparison of quantitative assessment of ablation experiments on the GaoFen-2 dataset.

Index	MSSCM	HFCEM	Double layer	SAM	ERGAS	CC	Q	Q2n
1	✓		✓	1.2701	1.7005	0.9864	0.9935	0.9466
2		✓	✓	0.9886	0.8942	0.9943	0.9957	0.9754
3	✓	✓		1.0408	1.2261	0.9920	0.9946	0.9681
4	✓	✓	✓	0.8653	0.7972	0.9969	0.9968	0.9823

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, W.; Liu, Y.; Sun, L.; Chen, Q.; Gao, L. A Novel Dual-Branch Pansharpening Network with High-Frequency Component Enhancement and Multi-Scale Skip Connection. Remote Sens. 2025, 17, 776. https://doi.org/10.3390/rs17050776

AMA Style

Huang W, Liu Y, Sun L, Chen Q, Gao L. A Novel Dual-Branch Pansharpening Network with High-Frequency Component Enhancement and Multi-Scale Skip Connection. Remote Sensing. 2025; 17(5):776. https://doi.org/10.3390/rs17050776

Chicago/Turabian Style

Huang, Wei, Yanyan Liu, Le Sun, Qiqiang Chen, and Lu Gao. 2025. "A Novel Dual-Branch Pansharpening Network with High-Frequency Component Enhancement and Multi-Scale Skip Connection" Remote Sensing 17, no. 5: 776. https://doi.org/10.3390/rs17050776

APA Style

Huang, W., Liu, Y., Sun, L., Chen, Q., & Gao, L. (2025). A Novel Dual-Branch Pansharpening Network with High-Frequency Component Enhancement and Multi-Scale Skip Connection. Remote Sensing, 17(5), 776. https://doi.org/10.3390/rs17050776

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Dual-Branch Pansharpening Network with High-Frequency Component Enhancement and Multi-Scale Skip Connection

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. HFCEM

3.2. MSSCM

3.3. Loss Function

4. Experiments and Results

4.1. Experimental Design

4.2. Methods of Comparison

4.3. Evaluation Indicators

4.4. Reduced-Scale Experiments

4.5. Full-Scale Experiments

4.6. Ablation Experiments

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI