Next Article in Journal
Pyramid Cascaded Convolutional Neural Network with Graph Convolution for Hyperspectral Image Classification
Next Article in Special Issue
Assessing Data Preparation and Machine Learning for Tree Species Classification Using Hyperspectral Imagery
Previous Article in Journal
Inversion Uncertainty of OH Airglow Rotational Temperature Based on Fine Spectral Measurement
Previous Article in Special Issue
DDSR: Degradation-Aware Diffusion Model for Spectral Reconstruction from RGB Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion

by
Danfeng Liu
1,*,
Enyuan Wang
1,
Liguo Wang
1,
Jón Atli Benediktsson
2,
Jianyu Wang
1 and
Lei Deng
1
1
College of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
2
Faculty of Electrical and Computer Engineering, University of Iceland, 107 Reykjavik, Iceland
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(16), 2941; https://doi.org/10.3390/rs16162941
Submission received: 30 June 2024 / Revised: 27 July 2024 / Accepted: 8 August 2024 / Published: 11 August 2024

Abstract

:
Pansharpening refers to the process of fusing multispectral (MS) images with panchromatic (PAN) images to obtain high-resolution multispectral (HRMS) images. However, due to the low correlation and similarity between MS and PAN images, as well as inaccuracies in spatial information injection, HRMS images often suffer from significant spectral and spatial distortions. To address these issues, a pansharpening method based on multimodal texture correction and adaptive edge detail fusion is proposed in this paper. To obtain a texture-corrected ( T C ) image that is highly correlated and similar to the MS image, the target-adaptive CNN-based pansharpening (A-PNN) method is introduced. By constructing a multimodal texture correction model, intensity, gradient, and A-PNN-based deep plug-and-play correction constraints are established between the T C and source images. Additionally, an adaptive degradation filter algorithm is proposed to ensure the accuracy of these constraints. Since the T C image obtained can effectively replace the PAN image and considering that the MS image contains valuable spatial information, an adaptive edge detail fusion algorithm is also proposed. This algorithm adaptively extracts detailed information from the T C and MS images to apply edge protection. Given the limited spatial information in the MS image, its spatial information is proportionally enhanced before the adaptive fusion. The fused spatial information is then injected into the upsampled multispectral (UPMS) image to produce the final HRMS image. Extensive experimental results demonstrated that compared with other methods, the proposed algorithm achieved superior results in terms of both subjective visual effects and objective evaluation metrics.

1. Introduction

In recent decades, with the launch of an increasing number of remote sensing satellites, the imagery data captured by these satellites has been widely utilized in fields such as urban planning, climate monitoring, and agriculture, leading to rapid advancements in remote sensing technology [1]. However, due to limitations in satellite imaging sensor hardware, it is difficult to simultaneously obtain both a high spatial resolution and high spectral resolution in multispectral images. Multispectral (MS) images provide rich spectral information but with a lower spatial resolution, whereas panchromatic (PAN) images offer a higher spatial resolution but poorer spectral information [2]. Therefore, employing pansharpening techniques to enhance the spatial resolution of low-resolution multispectral (LRMS) images involves fusing LRMS and PAN images, and thus, leveraging their respective strengths to achieve high-resolution multispectral (HRMS) images [3]. The HRMS images obtained through pansharpening can be widely used in fields such as crop classification, target extraction, ecological monitoring, and geological research. This makes pansharpening technology an indispensable part of the production and application of remote sensing images. Therefore, conducting in-depth research on pansharpening technology has extensive application prospects and practical significance [4].
Due to the rapid development of pansharpening techniques in recent years, they can be categorized into four main types: component substitution (CS)-based methods, multi-resolution analysis (MRA)-based methods, variational optimization (VO)-based methods, and deep learning (DL)-based methods [5]. CS-based methods involve separating the upsampled multispectral (UPMS) image into spatial and spectral components, then substituting the spatial components with information from the PAN image. Common classical CS-based methods include intensity–hue–saturation (IHS) [6], principal component analysis (PCA) [7], partial replacement adaptive component substitution (PRACS) [8], and band-dependent spatial detail (BDSD) [9]. CS-based methods typically preserve spatial details well, thus achieving high spatial quality, and are straightforward to implement with high computational efficiency. However, they can suffer from significant spectral distortions.
MRA-based methods decompose UPMS and PAN images into multiple spatial scales and perform fusion separately at each scale. Common classical MRA-based methods include the generalized Laplacian pyramid (GLP) [10], wavelet transform (WT) [11], nonsubsampled contourlet transform (NSCT) [12], and nonsubsampled shearlet transform (NSST) [13]. While MRA-based methods tend to preserve spectral information effectively, they may introduce spatial distortions due to the decomposition of spatial structures.
VO-based methods apply spectral and spatial prior constraints between the MS, PAN, and ideal HRMS images to construct a reasonable degradation model and perform optimization solving. VO-based methods can be categorized into model-based optimization, regularization-based optimization, Bayesian-based optimization, and sparse representation-based methods. Examples of classical methods for constructing VO-based models include a variational approach for pan-sharpening [14], total variation (TV) [15], high-quality Bayesian pansharpening [16], and remote sensing image fusion via compressive sensing [17]. VO-based methods typically preserve spatial and spectral information better than CS- and MRA-based methods, which results in superior fusion outcomes. However, making unreasonable model assumptions can lead to unpredictable deviations. Therefore, these methods require accurate mathematical modeling and further efficiency improvements.
DL-based methods have gained widespread application in fields such as pansharpening due to their strong feature extraction and nonlinear learning capabilities. For instance, Masi et al. introduced pansharpening by convolutional neural networks (PNNs) [18], which was the first framework to integrate CNNs into the pansharpening process with a simple three-layer network. Scarpa et al., proposed target-adaptive CNN-based pansharpening (A-PNN) [19], which adapts to different sensor inputs during training and results in a better performance and faster training compared with the PNN method. Generally, DL-based methods can achieve excellent fusion results. However, they require a large amount of training data to optimize the network parameters and significant computational resources for training. Moreover, DL-based models trained on specific datasets tend to perform well only on images that closely match the training data characteristics. Once trained, the network parameters are fixed, which limits their adaptability to new datasets from different sensors. As a result, DL-based methods may struggle to further improve their accuracy without retraining on new data.
Currently, many pansharpening methods face the problem of low correlation and similarity between MS and PAN images, which results in the inaccurate extraction of spatial details and even relying solely on PAN images for detail extraction. During the fusion process, it is challenging to balance the spectral and spatial information, which leads to spatial and spectral distortions in the fused image. Although DL-based methods can be applied to balance spectral and spatial information, supervised training networks, for example, are only suitable for the test dataset. Frequent training on different datasets can sharply increase the training time and costs. To address these issues, a multimodal texture correction and adaptive edge detail fusion model is proposed. The main contributions of this paper are as follows:
(1)
To enhance the correlation and similarity between source images, a multimodal texture correction model is proposed. This model takes the intensity component of the LRMS image ( I 0 ), the PAN image, and the intensity component of the image fused using A-PNN ( I n e t ) as inputs, and outputs a texture-corrected ( T C ) image. The model applies intensity correction constraints between the T C and I 0 images; gradient correction constraints between the T C , I 0 , and PAN images; and an A-PNN-based deep plug-and-play correction prior between the T C and I n e t images.
(2)
Due to the difficulty in determining the degradation filter in intensity correction constraints, an adaptive degradation filter algorithm is proposed to ensure the accuracy of each constraint prior. This algorithm adaptively determines the degradation filter in the model, thereby enhancing the correlation and similarity between T C and the source images within the multimodal texture correction model.
(3)
To achieve accurate spatial information injection, an adaptive edge detail fusion model is proposed. This model adaptively extracts the detail information from T C and applies edge protection; similarly, it extracts the detail information from the UPMS image and applies edge protection. The spatial information of the UPMS image is then elevated to the same level as T C . Finally, the spatial information of the T C and UPMS images is adaptively fused to obtain more accurate spatial information. Extensive experiments were conducted on four datasets in this paper. The subjective and objective evaluation fusion results demonstrate that the proposed algorithm achieved superior performance compared with the other methods while also maintaining high operational efficiency.
The rest of this paper is organized as follows. Section 2 describes the related work. Section 3 provides a detailed introduction to the proposed model. Section 4 presents and analyzes the experimental results of two comparative experiments, namely, simulation and real experiments, conducted on four datasets. Section 5 concludes this paper and suggests directions for future research.

2. Related Works

2.1. Injection Model

Injection models are commonly used in pansharpening methods. The main idea is to inject the high spatial resolution details from the PAN image into the original UPMS image, which has a high spectral resolution, to generate an HRMS image. This approach addresses the issue of LRMS images lacking significant spatial information [20]. Let us assume the dimensions of the LRMS image are L × W × B (i.e., length × width × number of bands), and the dimensions of the PAN image are L × W , where L = L / r and W = W / r , with r representing the reduction ratio. Consequently, the dimensions of both the UPMS and HRMS images are L × W × B . The specific formula for the injection model can be uniformly expressed as
M H R = M U P + G S D
where M H R represents the HRMS image, M U P represents the UPMS image, G represents the injected gain, and S D represents the injected spatial detail information. The methods for extracting S D can be uniformly divided into CS- and MRA-based methods. For CS-based methods, S D can be extracted using the following formula:
S D = P I I U P
where P I represents the image obtained by histogram matching the intensity component of the UPMS ( I U P ) image to the PAN image. Histogram matching ensures that the intensity and contrast of the PAN and LRMS images are within the same grayscale range, which ensures the accuracy of the spatial information extraction. The formula for P I is as follows:
P I = σ I σ P ( P μ P ) + μ I
where P represents the original PAN image; μ P and μ I denote the mean values of the P and I U P images, respectively; and σ P and σ I denote the variances of the P and I U P images, respectively. I U P is obtained from M U P through the linear weighting of each band, with the formula as follows:
I U P = i = 1 B ω i M U P i
where ω denotes the linear weighting coefficient, and the superscript or subscript i indicates the i-th band of the image. For MRA-based methods, S D can be extracted using the following formula:
S D = P P D P D = H L P P
where P D represents the degraded PAN image, which can be obtained by applying a low-pass filter H L P to P . H L P introduces a blur effect to P . However, issues such as inaccurate injected spatial detail information still persist. Since the missing spatial detail information in the LRMS image is generally inferred from the PAN image, inaccuracies in this inference and potential mismatches in spectral information during the fusion process can prevent the simultaneous preservation of accurate spectral fidelity and spatial fidelity. This leads to spectral and spatial distortions in the fused image.

2.2. VO-Based Model

VO-based methods have become increasingly popular in recent years. They can establish mathematical models to ensure the accuracy of both the spectral and spatial information in images [21]. The established mathematical model can be viewed as a degradation model, where the ideal HRMS image after fusion is restored from LRMS and PAN images. This process can be seen as the inverse of the degradation of the ideal HRMS image into the source images. Therefore, VO-based methods can preserve spatial and spectral information of LRMS and PAN images through various optimization algorithms, and thus, combine them into the desired ideal HRMS image. Many researchers proposed various VO-based methods. Wu et al. proposed a new variational approach based on proximal deep injection and gradient intensity similarity for spatial–spectral image fusion (DMPIF) [22], which integrates deep convolutional neural networks into the VO-based framework to enhance the performance. Liu et al., introduced a unified pansharpening method with structure tensor-driven spatial consistency and a deep plug-and-play prior [23], which utilizes novel spatial consistency and deep plug-and-play consistency methods to better preserve spatial information while maintaining spectral fidelity, thereby improving the fusion quality. Lu et al. proposed an intensity mixture and band-adaptive detail fusion for pansharpening [24], which maintains intensity gradient information between the source and generated images and adaptively injects detail information guided by source and generated images to preserve the spectral information. Xiao et al. proposed a new context-aware detail injection fidelity method with adaptive coefficients estimation for variational pansharpening (CDIF) [25], which constructs complex relationships in the gradient domain between the PAN and HRMS images, and thus, effectively extracts the essential features from source images and estimates the adaptive coefficients for the model. Ayas et al. proposed an efficient pansharpening method via texture-based dictionary learning and sparse representation [26], which utilizes dictionary learning and sparse representation methods to generate a compact single dictionary from the texture information of MS images and results in more effective fusion outcomes. Iterative optimization of the constructed VO-based models is typically conducted using common methods, such as gradient descent [27], split Bregman iteration [28], fast iterative shrinkage-thresholding (FISTA) [29], and alternating direction method of multipliers (ADMM) [30]. In summary, VO-based methods generally establish an energy function among the LRMS, PAN, and ideal HRMS images, which can be divided into three terms: the first term is for the spectral fidelity, the second term is for the spatial fidelity, and the third term is for the regularization prior. The specific formulas are as follows:
E ( M H R ) = f s p e c t r a l ( M 0 , M H R ) + f s p a t i a l ( P , M H R ) + f p r i o r ( M H R )
where M 0 is the LRMS image. By applying blurring and downsampling operations to M H R , M 0 can be obtained. M H R can also be obtained from P through a linear weighted combination. Therefore, the energy function in Equation (6) can be simplified to the following commonly used form:
E ( M H R ) = λ 1 ( D H L P M H R M 0 ) + P C M H R + λ 2 f p r i o r ( M H R )
where λ 1 and λ 2 are penalty parameters, D represents the downsampling matrix, and C represents the linear weighted combination matrix. By optimizing the above equation, M H R can be obtained. Although VO-based methods can simultaneously preserve relatively accurate spectral and spatial information, they rely on the accuracy of the established mathematical model. An unreasonable VO-based model can overlook the correlation and similarity between the MS and PAN images, leading to mismatched spectral and spatial information, which can result in spectral and spatial distortions in the final HRMS image. Furthermore, the efficiency of most VO-based models in pansharpening methods is also relatively low.

3. Methodology

3.1. The Proposed Model Framework

To address the issues of low correlation and similarity between LRMS, PAN, and HRMS images, as well as the inaccuracy of spatial information injected into UPMS images, a pansharpening method based on multimodal texture correction and adaptive edge detail fusion is proposed. This method can improve the spectral and spatial distortions in HRMS images.
The input of the multimodal texture correction model consists of I 0 , PAN, and I n e t images, with T C as the output. An intensity correction prior is established to correct the intensity constraint between the I 0 and T C images. A gradient correction prior is established to correct the gradient constraint for the I 0 , PAN, and T C images. A-PNN-based deep plug-and-play correction prior is established to correct the intensity gradient constraint between the I n e t and T C images. These three correction priors form the foundation of the multimodal texture correction model. Additionally, a proposed adaptive degradation filter algorithm can be used within the intensity correction prior to accurately obtain an adaptive degradation filter H A for degrading T C . This ensures that the degraded T C maintains the highest correlation and similarity with the I 0 image. Finally, the ADMM is employed to optimize this model and obtain T C . Due to the high correlation and similarity between T C and the source image, after preserving the spectral information of the LRMS image, T C inherits the gradient information from the PAN image, and I n e t retains more image features, further ensuring the stability of the texture information. Therefore, T C can be used to replace the PAN image for subsequent fusion operations.
In the adaptive edge detail fusion model, spatial detail information exists not only in T C , but also partially in the MS image. Therefore, adaptive extraction of the detail information from T C occurs and edge protection is applied. Simultaneously, the detail information is extracted from the UPMS image using a modulation transfer function (MTF)-matched Gaussian filter, with edge protection applied. The detail information of the edge-protected UPMS image is enhanced to the level of T C and adaptively fused with the detail information of the edge-protected T C . This process results in spatial information with a high correlation and similarity to the source image. The spatial information is injected into the UPMS image in appropriate proportions to obtain the final HRMS image. The model framework of this study is illustrated in Figure 1, and specific processes are detailed in Section 3.2 and Section 3.3.

3.2. Multimodal Texture Correction Model

3.2.1. Intensity Correction Prior

As stated in Section 2.2, the spectral fidelity term can be obtained by performing operations of blurring and downsampling the HRMS images to obtain LRMS images, as specified by the following formula:
f s p e c t r a l i = 1 2 D H M H R i M 0 i F 2
where H typically represents the Gaussian smoothing filter [31]. To maintain an inherent correlation and similarity between the bands, each band’s LRMS and ideal HRMS images are linearly weighted and summed using Equation (4), which results in I 0 and the intensity component of the ideal HRMS image ( I H R ). The specific formula is
f s p e c t r a l 1 = 1 2 D H i = 1 B ω i M H R i i = 1 B ω i M 0 i F 2 = 1 2 D H I H R I 0 F 2
Since I H R is unknown, we assumed that T C is close to I H R and highly correlated. Therefore, the intensity correction prior term is formulated as follows:
E i n t e n s i t y = 1 2 D H T C I 0 F 2

3.2.2. Gradient Correction Prior

As stated in Section 3.2.1, while T C preserves the spectral information invariance, spatial information should also be retained. This is achieved by establishing a spatial fidelity term to preserve the gradient information of the PAN image, with the specific formula as follows:
f s p a t i a l 1 = α 2 2 T C 2 P F 2
where α is the penalty parameter, and 2 denotes the Laplacian operator. Due to the process where T C corrects the gradient information of the PAN image, there may be deviations in the intensity correction between the T C and I 0 images. Therefore, it is necessary to establish an additional spatial fidelity term to ensure that the intensity correction prior term remains unbiased to further enhance the correlation and similarity between the T C and I 0 images. The specific formula is as follows:
f s p a t i a l 2 = β 2 2 ( D H T C ) 2 I 0 F 2
where β is the penalty parameter. In summary, the gradient correction prior term can be represented as follows:
E g r a d i e n t = α 2 2 T C 2 P F 2 + β 2 2 ( D H T C ) 2 I 0 F 2

3.2.3. A-PNN-Based Deep Plug-and-Play Prior

To generate more texture features, it is necessary to further enhance the correlation and similarity between the T C , I 0 , and PAN images, and to preserve more spectral and spatial information. After fusing the PAN and UPMS images using A-PNN, the resulting HRMS image is denoted as M S n e t . Applying Equation (4) to M S n e t for linear weighting yields its intensity component, which is denoted as I n e t . A spectral fidelity term is subsequently established between T C and I n e t to correct the intensity information of T C . The specific formula is as follows:
f s p e c t r a l 2 = γ 2 T C I n e t F 2
where γ is the penalty parameter. The spatial fidelity term is subsequently established between T C and I n e t to correct the gradient information of T C . The specific formula is
f s p a t i a l 3 = δ 2 2 T C 2 I n e t F 2
where δ is the penalty parameter. In summary, the A-PNN-based deep plug-and-play prior can be expressed as
E D P P = γ 2 T C I n e t F 2 + δ 2 2 T C 2 I n e t F 2

3.2.4. Proposed Model

To ensure the sparsity of the output texture image and reduce the artifacts, in addition to incorporating the intensity correction prior term, gradient correction prior term, and the A-PNN-based deep plug-and-play correction prior term mentioned above, we also employed a TV regularization term. Therefore, in this paper, a multimodal texture correction model is proposed, with the following specific formula:
T C = arg min T C 1 2 D H T C I 0 F 2 + α 2 2 T C 2 P F 2 + β 2 2 ( D H T C ) 2 I 0 F 2 + γ 2 T C I n e t F 2 + δ 2 2 T C 2 I n e t F 2 + θ 2 T C 1
where θ is the penalty parameter.

3.2.5. Adaptive Degradation Filter Algorithm

In the model shown in Equation (17), all variables except for H and T C are determined. T C can be determined in Section 3.2.6, while H is difficult to determine. Therefore, here we propose an adaptive degradation filter algorithm that uses a Gaussian filter as the degradation filter, which is defined as H A and can be determined by
H A = arg min H A 1 2 D H A T C I 0 F 2
From the above equation, it can be seen that when the difference between D H A T C and I 0 is minimized, the correlation and similarity between them are maximized, making H A the optimal degradation filter. Therefore, this algorithm comprehensively considers both the correlation and similarity, which are measured by the correlation coefficient (CC) [32] and structural similarity index measure (SSIM) [33] indices, respectively, to adaptively determine the optimal degradation filter. When the filter processes the image in the spatial domain, convolution operations significantly increase the computational complexity, and when processing in the frequency domain, convolution operations transform into inner product operations, which reduces the computational complexity. Therefore, H A was chosen to operate in the frequency domain, with its frequency domain expression given as follows:
H A ( u , v ) = e D C 2 ( u , v ) 2 σ 2
where D C ( u , v ) represents the distance from point ( u , v ) to the center of the frequency domain, and σ represents the standard deviation. After H A is transformed to the frequency domain, T C also needs to be computed in the frequency domain. Therefore, the fast Fourier transform (FFT) is used to convert T C to the frequency domain, and the inverse fast Fourier transform (IFFT) is used to convert H A T C back to the spatial domain. This facilitates the subsequent correlation and similarity calculations between D H A T C and I 0 . The specific formula is as follows:
D H A T C = D F 1 ( H A ( u , v ) F ( T C ) )
where F ( ) denotes the FFT, and F 1 ( ) denotes the IFFT. In summary, determining H A hinges on identifying the unknown parameter σ . Therefore, by correcting the correlation and similarity between D H A T C and I 0 , an optimal σ can be found. The correlation is measured using the CC, which is denoted as ρ ( D H A T C , I 0 ) . The similarity is measured using the SSIM, which is denoted as S ( D H A T C , I 0 ) . By combining these two metrics using the average rule and iterating with different σ values, the final result is obtained by taking the maximum value, which indicates the optimal σ and is denoted as σ b e s t . The specific formula is as follows:
σ b e s t = arg max σ ρ ( D H A T C , I 0 ) + S ( D H A T C , I 0 ) 2
In summary, the overall process of the adaptive degradation filter algorithm is illustrated as Algorithm 1.
Algorithm 1. Adaptive degradation filter algorithm.
Input :   Texture-corrected   image   T C , I   component   of   LRMS   image   I 0 .
Initialize :   Set   σ ( 0 ) = 1 , step   length   s = 0.5 , iterative   step   k = 0 ;
Transform   H A into frequency domain via (19);
Calculate   ( D H A T C ) ( 0 ) via (20);
Calculate   ρ ( 0 ) ( D H A T C , I 0 )   and   S ( 0 ) ( D H A T C , I 0 ) ;
While   do   σ ( k + 1 ) = σ ( k ) + s , k = k + 1 ;
Optimize   ( D H A T C ) ( k + 1 ) via (20);
Optimize   ρ ( k + 1 ) ( D H A T C , I 0 )  and  S ( k + 1 ) ( D H A T C , I 0 ) ;
Calculate   σ ( k + 1 ) = ρ ( k + 1 ) ( D H A T C , I 0 ) + S ( k + 1 ) ( D H A T C , I 0 ) 2 via (21);
Until   σ ( k + 1 ) < σ ( k ) ;
Output :   σ b e s t = σ ( k ) , adaptive   degradation   filter   H A .

3.2.6. Optimization Model Algorithm

The optimization model algorithm uses ADMM to optimize, which decomposes the original problem into several easier-to-handle subproblems. For the ease of optimization, auxiliary variables are introduced: A = H A T C , C = 2 T , and then B = D A . Thus, the model described by Equation (17) can be formulated as follows:
min A , B , C 1 2 B I 0 F 2 + α 2 2 T C 2 P F 2 + β 2 2 B 2 I 0 F 2 + γ 2 T C I n e t F 2 + δ 2 2 T C 2 I n e t F 2 + θ C 1   s . t .   A = H A T C ,   B = D A ,   C = 2 T C
The augmented Lagrangian function for the above formula can be expressed as follows:
E L ( A , B , T C , H A , C , Λ 1 , Λ 2 , Λ 3 ) = 1 2 B I 0 F 2 + α 2 2 T C 2 P F 2 + β 2 2 B 2 I 0 F 2 + γ 2 T C I n e t F 2 + δ 2 2 T C 2 I n e t F 2 + θ C 1 + Λ 1 T ( A H A T C ) + Λ 2 T ( B D A ) + Λ 3 T ( C 2 T C ) + μ 1 2 A H A T C F 2 + μ 2 2 B D A F 2 + μ 3 2 C 2 T C F 2
where Λ 1 , Λ 2 , and Λ 3 are Lagrange multipliers, and μ 1 , μ 2 , and μ 3 are penalty parameters. To minimize the energy function in the above equation, iterative optimization is performed for A ( k + 1 ) , B ( k + 1 ) , T C ( k + 1 ) , H A ( k + 1 ) , C ( k + 1 ) , Λ 1 ( k + 1 ) , Λ 2 ( k + 1 ) , and Λ 3 ( k + 1 ) until convergence, and thus, the final value of T C is obtained, where k denotes the iteration count. The specific optimization process is shown below:
(1)
Optimization for A ( k + 1 )
Fixing the other variables, the subproblem for A ( k + 1 ) is as follows:
A ( k + 1 ) = arg min A ( Λ 1 ( k ) ) T ( A ( k ) H A ( k ) T C ( k ) ) + ( Λ 2 ( k ) ) T ( B ( k ) D A ( k ) ) + μ 1 2 A ( k ) H A ( k ) T C ( k ) F 2 + μ 2 2 B ( k ) D A ( k ) F 2
Setting the deviation of A ( k + 1 ) to zero, i.e., E L / A ( k + 1 ) = 0 , A ( k + 1 ) is determined by the following equation:
A ( k + 1 ) = Λ 1 ( k ) + D T Λ 2 ( k ) + μ 1 H A ( k ) T C ( k ) + μ 2 D T B ( k ) μ 1 U + μ 2 D T D
where U represents the identity matrix, and the superscript T denotes the transpose operator.
(2)
Optimization for B ( k + 1 )
Fixing the other variables, the subproblem for B ( k + 1 ) is as follows:
B ( k + 1 ) = arg min B 1 2 B ( k ) I 0 F 2 + β 2 2 B ( k ) 2 I 0 F 2 + ( Λ 2 ( k ) ) T ( B ( k ) D A ( k + 1 ) ) + μ 2 2 B ( k ) D A ( k + 1 ) F 2
The deviation of B ( k + 1 ) is set to zero, i.e., E L / B ( k + 1 ) = 0 ; however, due to the presence of the Laplacian operator, this increases the computational complexity during the solving process. To enhance the computational efficiency, FFT and IFFT are employed, which facilitates rapid calculations in the frequency domain before transforming back to the spatial domain. Therefore, after optimizing A ( k + 1 ) , B ( k + 1 ) can be determined by the following equation:
B ( k + 1 ) = F 1 F ( I 0 + β ( 2 ) T 2 I 0 Λ 2 ( k ) + μ 2 D A ( k + 1 ) ) F ( ( 1 + μ 2 ) U + β ( 2 ) T 2 )
(3)
Optimization for T C ( k + 1 )
Fixing the other variables, the subproblem for T C ( k + 1 ) is as follows:
T C ( k + 1 ) = arg min T C α 2 2 T C ( k ) 2 P F 2 + γ 2 T C ( k ) I n e t F 2 + δ 2 2 T C ( k ) 2 I n e t F 2 + ( Λ 1 ( k ) ) T ( A ( k + 1 ) H A ( k ) T C ( k ) ) + ( Λ 3 ( k ) ) T ( C ( k ) 2 T C ( k ) ) + μ 1 2 A ( k + 1 ) H A ( k ) T C ( k ) F 2 + μ 3 2 C ( k ) 2 T C ( k ) F 2
The deviation of T C ( k + 1 ) is set to zero, i.e., E L / T C ( k + 1 ) = 0 ; due to the presence of the Laplacian operator, FFT and IFFT are also employed for solving. Therefore, after optimizing A ( k + 1 ) , T C ( k + 1 ) can be determined by the following equation:
T C ( k + 1 ) = F 1 F ( α ( 2 ) T 2 P + γ I n e t + δ ( 2 ) T 2 I n e t + ( H A ( k ) ) T Λ 1 ( k ) + ( 2 ) T Λ 3 ( k ) + μ 1 ( H A ( k ) ) T A ( k + 1 ) + μ 3 ( 2 ) T C ( k ) ) F ( ( α + δ + μ 3 ) ( 2 ) T 2 + γ + μ 1 ( H A ( k ) ) T H A ( k ) )
(4)
Optimization for C ( k + 1 )
Fixing the other variables, the subproblem for C ( k + 1 ) is as follows:
C ( k + 1 ) = arg min C θ C ( k ) 1 + ( Λ 3 ( k ) ) T ( C ( k ) 2 T C ( k + 1 ) ) + μ 3 2 C ( k ) 2 T C ( k + 1 ) F 2 = arg min C θ μ 3 C ( k ) 1 + 1 2 C ( k ) 2 T C ( k + 1 ) Λ 3 ( k ) μ 3 F 2
Further simplifying using the soft thresholding ( S T ) formula yields the following equation:
C ( k + 1 ) = S T 2 T C ( k + 1 ) Λ 3 ( k ) μ 3 , θ μ 3 = sgn 2 T C ( k + 1 ) Λ 3 ( k ) μ 3 max 2 T C ( k + 1 ) Λ 3 ( k ) μ 3 θ μ 3 , 0
where sgn ( ) is the sign function, and max ( ) is the maximum function.
(5)
Optimization for Λ 1 ( k + 1 ) , Λ 2 ( k + 1 ) , and Λ 3 ( k + 1 )
Fixing the other variables, the subproblems for Λ 1 ( k + 1 ) , Λ 2 ( k + 1 ) , and Λ 3 ( k + 1 ) are obtained through the gradient ascent method:
Λ 1 ( k + 1 ) = Λ 1 ( k ) + φ ( k + 1 ) ( A ( k + 1 ) H A ( k + 1 ) T C ( k + 1 ) ) Λ 2 ( k + 1 ) = Λ 2 ( k ) + φ ( k + 1 ) ( B ( k + 1 ) D A ( k + 1 ) ) Λ 3 ( k + 1 ) = Λ 3 ( k ) + φ ( k + 1 ) ( C ( k + 1 ) 2 T C ( k + 1 ) )
where φ represents the step length required for the gradient ascent, as given by the following formula:
φ ( k + 1 ) = τ φ ( k )
where τ is a penalty parameter, with the condition τ > 1 , which accelerates the convergence rate. In summary, the overall optimization process of the multimodal texture correction model is illustrated as Algorithm 2. In this process, H A ( k + 1 ) is optimized using Algorithm 1. The iteration stops when the relative change ( R e l C h a ) in T C between two consecutive iterations is less than a tolerance deviation ε . The final T C is obtained accordingly. The formula for the relative change determination is as follows:
R e l C h a = T C ( k + 1 ) T C ( k ) F T C ( k ) F < ε
As the iterations progress, R e l C h a gradually decreases. Therefore, it is necessary to determine the parameter ε such that it is slightly larger than R e l C h a to balance the efficiency and accuracy of the model. For instance, Figure 2 shows the convergence result from the test image in the WorldView-3 dataset. When the number of iterations reached around 15, R e l C h a tended to converge and approach 10 4 . Therefore, ε could be assigned a value of 10 4 .
Algorithm 2. Optimization algorithm of the multimodal texture correction model.
Input: PAN image P ,   I   component   of   LRMS   image   I 0 .
Initialize :   Set   T C ( 0 ) = P ,   H A ( 0 )   is   initialized   by   Algorithm   1 ,   Λ 1 ( 0 ) = Λ 2 ( 0 ) = Λ 3 ( 0 ) = U ,   φ ( 0 ) = 1 ,   τ = 1.01 ,   k = 0 .
While   R e l C h a > ε do
Optimize   A ( k + 1 ) via (25);
Optimize   B ( k + 1 ) via (27);
Optimize   T C ( k + 1 ) via (29);
Optimize   H A ( k + 1 ) via Algorithm 1;
Optimize   C ( k + 1 ) via (31);
Optimize   Λ 1 ( k + 1 ) ,   Λ 2 ( k + 1 ) ,   and   Λ 3 ( k + 1 ) via (32);
φ ( k + 1 ) = τ φ ( k ) ,   k = k + 1 .
End While
Output :   Texture-corrected image   T C .

3.3. Adaptive Edge Detail Fusion Model

3.3.1. Adaptive Extraction of TC Image Detail and Applying Edge Protection

After obtaining T C using Algorithm 2, the following formula was selected to extract the details D T C :
D T C = T C T C L
where T C L denotes the low-resolution form of T C . To extract the details more accurately from T C , from Equations (2) and (5), it is known that T C L can be obtained through two methods. The first method involves obtaining a degraded image T C D of T C using Algorithm 1, which is akin to the MRA-based methods for detail extraction, which better preserve the spectral information. The second method involves obtaining I U P using Equation (4), which is akin to the CS-based methods for detail extraction, which better preserve the spatial information. Therefore, considering the advantages of these two methods, an adaptive extraction of D T C was designed as follows:
T C L = χ 1 I U P + ( 1 χ 1 ) T C D s . t .   0 < χ 1 < 1
where χ 1 represents the weight coefficient to be determined. Due to the influence of the correlation and similarity between the source images on the accuracy of the detail extraction using both methods, the influence coefficient for I U P can be set as x 1 and for T C D as x 2 . The formulas are as follows:
x 1 = ρ ( T C , I U P ) + S ( T C , I U P ) 2 x 2 = ρ ( T C , T C D ) + S ( T C , T C D ) 2
Since x 1 and x 2 do not satisfy the normalization constraint of χ 1 , χ 1 should be positively correlated with x 1 and x 2 within a reasonable range. Therefore, χ 1 can be obtained using the following equation:
χ 1 = 1 e x 3 s . t .   x 3 = x 1 x 1 + x 2
After substituting χ 1 from the above equation into Equation (36) to obtain T C L , and then substituting T C L into Equation (35), D T C is finally obtained, which completes the operation of the adaptive detail extraction from the T C image. To simultaneously preserve the edge information during the detail extraction process, the following edge detection matrix formula E T C is utilized to extract edges [34]:
E T C = e η T 4 + ζ
where η and ζ are modulation coefficients, and denotes the gradient operator. Generally, η is set to 10 9 and ζ to 10 10 . Therefore, the detail information F 1 of the T C image with edge protection applied is as follows:
F 1 = D T C E T C

3.3.2. Extracting Detail from UPMS Image and Applying Edge Protection

The following formula is selected to extract the details of the UPMS image D M :
D M i = M U P i M U P L i
where M U P L represents the low-resolution version of the UPMS image. Since M U P L is unknown, the MTF [35,36] obtained from the MS sensor is introduced as a crucial indicator for extracting details from the UPMS image. Therefore, an MTF-matched Gaussian filter H M G is applied to degrade the UPMS image, which results in its low-resolution version. The specific process is shown in the following equation:
M U P L i = H M G M U P i
Substituting the above equation into Equation (41) yields the detail information of the UPMS image. At this point, it is necessary to apply edge protection to D M using the edge detection matrix formula E M [34]:
E M i = e η M U P b 4 + ζ
Therefore, the detail information F 2 of the UPMS image with edge protection applied is as follows:
F 2 i = D M i E M i

3.3.3. Adaptive Edge Detail Fusion Process

After extracting the edge-protected detail information from the T C and UPMS images, F 1 and F 2 can be fused. However, since the spatial resolution of the UPMS image is lower than that of T C , F 2 contains less detail information than F 1 . Directly fusing them may result in a loss of detail information. To avoid this situation, the information in F 2 is enhanced to match the level of F 1 before the fusion. The specific formula is as follows:
ξ i = arg min ξ i 1 2 F 1 ξ i F 2 i F 2
where ξ is the scaling factor, which is determined using a linear regression model [37]. Therefore, the spatial information enhanced by ξ , which is denoted as F 3 , is expressed as follows:
F 3 i = ξ i F 2 i
At this point, F 1 and F 3 can be adaptively fused to obtain the detail information F . The specific algorithm is as follows:
F i = χ 2 F 1 + ( 1 χ 2 ) F 3 i
where χ 2 is the weight coefficient. The allocation of weight to the detail information is influenced by the correlation and similarity between the T C and UPMS images. Therefore, Equation (37) can be used to establish the relationship between T C and I U P as x 1 , while ensuring that χ 2 remains within a reasonable range and is positively correlated with x 1 . The specific formula is as follows:
χ 2 = 1 e x 1
Substituting χ 2 from the above equation into Equation (47) yields the final F .

3.3.4. Final Injection of Spatial Edge Detail Information

By substituting F from Equation (47) into the injection model below, the final HRMS image is obtained:
M H R i = M U P i + g i M U P i 1 B i = 1 B M U P i F i
where g represents the scaling factor for the injected details, which can be adaptively determined by the following formula:
g i = σ 2 ( T C ) + cov ( T C , M U P i ) σ 2 ( T C )
where cov ( ) represents the covariance function, and σ 2 represents the variance function.

4. Experiments and Results

4.1. Experimental Design

In Section 4, to demonstrate the performance advantages and effectiveness of the proposed algorithm, the proposed method is compared with ten methods: GSA [38], NIHS [39], BDSD-PC [40], FusionNet [41], ATWT-W3 [42], BT-H [43], SR-D [44], DMPIF [22], CDIF [25], and A-PNN [19]. Extensive experiments were conducted using four datasets: GaoFen-2, QuickBird, WorldView-2, and WorldView-3 [45]. Each image pair in the datasets included one MS image and one PAN image. In the GaoFen-2 and QuickBird datasets, the MS images had four bands, whereas in the WorldView-2 and WorldView-3 datasets, the MS images had eight bands. All datasets contained PAN images with only one band.
To better evaluate the performance of the proposed algorithm, two comparison experiments were conducted. The first experiment was a simulation experiment, i.e., a reduced-scale (RS) experiment. According to the Wald protocol, the original MS image was used as a reference image, also known as the ground truth (GT) image [46]. In this experiment, the original MS and PAN images were downsampled by a factor of four. The downsampled images served as the source images for the RS experiment. The algorithm proposed in this paper was used to fuse these source images, and the fused image was compared with the GT image. A smaller difference indicates better performance. Therefore, in this experiment, each band of the GT image was cropped to 256 × 256 pixels, which resulted in each band of the MS image being cropped to 64 × 64 pixels, and the PAN image was cropped to 256 × 256 pixels.
The second experiment was a real experiment, i.e., a full-scale (FS) experiment. After successfully implementing the RS experiment, the FS experiment could be conducted, where the source images were directly fused. Since there was no GT image available as a reference, the polynomial kernel upsampling (EXP) [47] method with twenty-three coefficients was used as the spectral benchmark. Additionally, each band of the original MS image was cropped to 128 × 128 pixels, and the PAN image was cropped to 512 × 512 pixels. As a result, each band of the fused image was also 512 × 512 pixels. Detailed information about these four datasets is summarized in Table 1.
To evaluate and compare the image quality of different methods, combined objective and subjective evaluation criteria were adopted. In the RS experiment, nine commonly used objective evaluation metrics were employed: the Q2n index (Q4 for four-band datasets, Q8 for eight-band datasets) [48] to assess the spatial and spectral qualities, the peak signal-to-noise ratio (PSNR) [49] to measure the error between the reconstructed and reference images, the universal image quality index (UIQI) [50] to comprehensively evaluate the quality differences and similarities after the fusion, the relative average spectral error (RASE) [51] to evaluate the average spectral differences before and after the fusion, the root-mean-square error (RMSE) to evaluate the overall difference between the fused image and the reference image, the error relative global dimensionless synthesis (ERGAS) [52] to indicate the distortion levels in spatial and spectral information, the spectral correlation coefficient (SCC) [53] to measure the preservation of the spectral information in the images, the correlation coefficient (CC) [32] to indicate the degree of correlation between the fused image and the reference image, and the structural similarity index measure (SSIM) [33] to evaluate the similarity between the fused image and the reference image. The subjective evaluation visualized the fused MS image by extracting the red (R), green (G), and blue (B) bands to display true-color fused images, which provided a more intuitive reflection of the quality differences in the image.
In the FS experiment, three additional objective evaluation metrics were used: D λ [54] for the spectral distortion during the fusion, D S [54] for the spatial distortion during the fusion, and the quality without reference (QNR) [54] to assess the quality of the fused images. In the above evaluation metrics, the ideal values are as follows: 1 for Q2n, UIQI, SCC, CC, SSIM, and QNR; 0 for RASE, RMSE, ERGAS, D λ , and D S ; and infinity for PSNR. The datasets used for the RS and FS experiments discussed in Section 4.2 and Section 4.3 are illustrated in Table 1. Each experiment included subjective and objective evaluations of a pair of images from their respective datasets. All experiments discussed in Section 4 were conducted on a PC equipped with an Intel Core i7-12700 CPU running at a base speed of 2.10 GHz with 32 GB of memory. The experimental platform used was MATLAB R2021b.

4.2. Reduced-Scale Experiments

4.2.1. QuickBird Dataset

For the RS experiment, Figure 3 shows the subjective evaluation fusion results of the proposed method and various compared methods on the QuickBird dataset, where the GT image served as the reference. To clarify the spatial and spectral information of the images, the local fusion results were magnified. From the enlarged red rectangles in the local area, it can be observed that the GSA and BT-H methods show excessive details in the roofs of the houses. The images produced by the NIHS, SR-D, BDSD-PC, and ATWT-M3 methods were relatively blurry and darker. The images produced by the FusionNet method suffered from issues such as excessive brightness and noticeable deviations in the edge information. The CDIF method maintained good edge information but introduced artifacts. Despite the fact the DMPIF and A-PNN methods preserved the spatial information relatively well, they exhibited excessive color information at the edges, which resulted in relatively severe spectral distortion. In contrast, the results of the proposed method in this paper closely approximated the GT image, which effectively preserved both the spatial and spectral information. The objective evaluation fusion results of Figure 3 are shown in Table 2, where the values inside parentheses indicate the ideal outcomes, and the metrics highlighted in bold black text indicate the optimal results. It can be observed that compared with the other ten methods, the proposed method achieved the superior results across all evaluation metrics and required relatively less time.

4.2.2. WorldView-2 Dataset

Figure 4 presents the subjective evaluation fusion results of the WorldView-2 dataset from the various compared methods. From the enlarged red rectangles, it can be observed that the GSA, BDSD-PC, ATWT-M3, and SR-D methods exhibited issues such as poor image clarity, severe spatial distortion, and darker colors. The NIHS method showed issues, with the excessive injection of spatial information in certain areas. In the FusionNet method, there was an issue with inaccurately preserving spatial detail information. The CDIF method showed significant spatial and spectral distortion issues. The DMPIF method lacked clarity compared with the GT image and suffered from severe artifact problems. In the A-PNN and BT-H methods, spectral distortions were present in some areas, along with the poor retention of spatial information. The proposed method in this paper aligned most closely with the GT image and visually outperformed other compared methods. Table 3 displays the objective evaluation fusion results from Figure 4. Compared with the other ten methods, the proposed method achieved superior results across all evaluation metrics and operated in a relatively shorter time.

4.2.3. WorldView-3 Dataset

Figure 5 illustrates the subjective evaluation fusion results of the WorldView-3 dataset across various methods. From the enlarged red rectangles, it is evident that the GSA method exhibited darker colors on the rooftops compared with the GT image. The NIHS method showed some artifacts, which impacted the spatial information quality of the image. The images produced by the BDSD-PC, ATWT-M3, and SR-D methods appeared blurry. The images from the BT-H method had excessively high brightness and exhibited significant spectral distortion. The FusionNet method introduced excessive detail into the images, which resulted in some color alterations. Although the CDIF method preserved the spectral information well, it lacked details and suffered from significant spatial distortion. The DMPIF and A-PNN methods exhibited some color changes, which led to severe spectral distortion. The proposed method in this paper aligned closest with the GT image, as seen by the superior subjective visual results achieved. Table 4 presents the objective evaluation fusion results from Figure 5. It is evident that our method outperformed others across all evaluation metrics, with shorter processing times.

4.3. Full-Scale Experiments

4.3.1. GaoFen-2 Dataset

For the FS experiment, Figure 6 presents the subjective evaluation fusion results of the proposed method compared with the other methods, where the EXP image was used as the spectral reference. From the enlarged red rectangles, it is evident that the GSA, FusionNet, BT-H, and A-PNN methods exhibited significant color changes compared with the EXP image, which resulted in severe spectral distortion. The images produced by the DMPIF method exhibited significant problems, with artifacts and noticeable color changes. The NIHS, BDSD, ATWT-M3, and SR-D methods produced blurry images with substantial spatial distortion. The CDIF method introduced extraneous color artifacts and showed blurriness in some regions. The proposed method in this paper enhanced the spatial resolution of the UPMS image while maintaining spectral information close to the EXP image, and thus, yielded superior visual results compared with the other methods. The objective evaluation fusion results of Figure 6 are shown in Table 5. The results indicate that our method outperformed the others in the D s and QNR metrics and was slightly inferior in the D λ metric, where it achieved the highest spatial resolution with minimal spectral loss.

4.3.2. WorldView-2 Dataset

Figure 7 presents the subjective evaluation fusion results of the WorldView-2 dataset from the various compared methods. From the enlarged red rectangle, it can be seen that the GSA and FusionNet methods exhibited more noticeable color changes compared with the EXP images, which led to more pronounced spectral distortions. The images processed by the NIHS, BDSD-PC, and ATWT-M3 methods were significantly blurry, with pronounced spatial distortions. The BT-H method resulted in images with darker colors, which led to some spectral distortion. The SR-D method exhibited poor edge preservation and significant deviations in spatial detail information injection. The DMPIF and A-PNN methods exhibited unnecessary color markers and produced artifacts. The CDIF method exhibited deviations in edge preservation between the red and green in the images, and the colors changed. Our proposed method, while maintaining spectral proximity to EXP, preserved accurate spatial information and visually outperformed the other compared methods. The objective evaluation fusion results of Figure 7 are shown in Table 6. The results indicate that our method outperformed the others in terms of the D s and QNR metrics, while slightly trailing the NIHS method in the D λ metric. This achievement ensured the highest spatial resolution of the images with minimal spectral loss. In summary, across the RS and FS experiments on the four datasets, the proposed method in this paper consistently outperformed other compared methods. It effectively balanced the spectral and spatial information to achieve a superior image quality in less time.

4.4. Parameters Analysis

From Algorithm 2, it is evident that certain unknown parameters still required determination. To enhance the stability of the proposed model in this paper, a grid search method was employed to adaptively determine these parameters. Parameters μ 1 , μ 2 , and μ 3 play similar roles in Equation (23). Therefore, to reduce the parameter complexity, we set μ = μ 1 = μ 2 = μ 3 . At this point, the six parameters that needed to be determined are α , β , γ , δ , θ , and μ . Since parameters α and β describe the relationship between the source image and the T C image, we first combined them to determine these two parameters. Next, parameters γ and δ describe the spectral and spatial fidelities of the A-PNN-based deep plug-and-play term, and thus, we combined them to determine the parameters. Finally, we determined the remaining two parameters: θ and μ .
In the RS experiments, the Q2n metric was used to evaluate the spatial and spectral qualities of the fused images. In the FS experiment, the QNR metric was used for the same purpose. The results are shown in Figure 8. First, with the other parameters fixed, α and β were searched. As seen in Figure 8a–e, the optimal values for the QuickBird dataset were 24.7 and 4 × 10 4 ; for the WorldView-2 dataset in the RS experiment, they were 18.4 and 2 × 10 2 ; for the WorldView-3 dataset, they were 15 and 2 × 10 3 ; for the GaoFen-2 dataset, they were 33.1 and 1 × 10 7 ; and for the WorldView-2 dataset in the FS experiment, they were 9.2 and 8 × 10 4 , respectively. Similarly, searching for γ and δ , Figure 8f–j show that the optimal values for the QuickBird dataset were 9 × 10 1 and 7 × 10 3 ; for the WorldView-2 dataset in the RS experiment, they were 1.6 and 6 × 10 4 ; for the WorldView-3 dataset, they were 9 × 10 1 and 6; for the GaoFen-2 dataset, they were 3 × 10 2 and 1 × 10 6 ; and for the WorldView-2 dataset in the FS experiment, they were 5 × 10 1 and 1, respectively. Finally, searching for the remaining two parameters, θ and μ , Figure 8k–o indicate that the optimal values for the QuickBird dataset were 3.8 × 10 2 and 6 × 10 1 ; for the WorldView-2 dataset in the RS experiment, they were 1.08 × 10 2 and 2; for the WorldView-3 dataset, they were 23.5 and 2.7; for the GaoFen-2 dataset, they were 1.6 × 10 2 and 9.8; and for the WorldView-2 dataset in the FS experiment, they were 12 and 2.4, respectively. In summary, after determining these optimal parameter values, the algorithm in this study achieved its superior fusion performance.

4.5. Ablation Study

Using a pair of images from the WorldView-3 dataset in Section 4.2.3, an ablation study experiment was conducted to validate the effectiveness of the algorithm proposed in this paper. The algorithm consisted of the multimodal texture correction model (MTC) and the adaptive edge detail fusion model (AEDF). These models were divided into five ablation models, as detailed in Table 7. For Model 1, which lacked both MTC and AEDF, the fusion images were generated using the following injection model:
M H R i = M U P i + g i M U P i 1 B i = 1 B M U P i ( P I U P )
In Models 2 to 4, Equation (17) was assigned values according to the parameters in Table 7, which resulted in T C replacing P in the above equation. The injection model used was as follows:
M H R i = M U P i + g i M U P i 1 B i = 1 B M U P i ( T C I U P )
Model 5 corresponded to the algorithm proposed in this paper. Table 7 presents the objective evaluation fusion results from Models 1 to 5, indicating improved performance with the inclusion of different ablation models. Figure 9 illustrates the subjective evaluation fusion results of Models 1 to 5. From the figure, it is evident that both subjective fusion performances of Models 1 to 5 progressively improved and approached closer to the GT image. This further validated the effectiveness of the algorithm proposed in this study.

5. Conclusions

Due to the low correlation and similarity between MS and PAN images acquired from different sensors, direct fusion can lead to significant spectral and spatial distortions. Moreover, achieving an ideal HRMS image requires accurately injecting spatial information from the PAN image into the UPMS image. However, inaccurate spatial information injection can degrade the spatial resolution of the HRMS image. To address these issues, this paper proposes a method based on multimodal texture correction and adaptive edge detail fusion models. The primary objective was to obtain a T C image that inherits precise spatial detail information from the PAN image while maintaining high correlation and similarity with the MS image. Several constraints were established for this purpose: intensity constraint between T C and I 0 ; gradient constraint between T C , PAN, and I 0 ; and an A-PNN-based deep plug-and-play constraint between T C and I n e t . An adaptive degradation filter algorithm is proposed to accurately maintain these constraints. Ultimately, a multimodal texture correction model was constructed. The ADMM algorithm is employed to solve this problem and generate T C , which can effectively replace the functionality of the PAN image. Since spatial detail information is not solely present in T C but also exists in LRMS image, an adaptive edge detail fusion model is proposed. This model extracts detail information from both the T C and UPMS images while applying edge protection. To extract detail information more accurately, an adaptive algorithm is used to extract details from T C , and MTF-matched Gaussian filters are used to extract details from the UPMS image. The edge-protected details from T C are adaptively fused with the enhanced edge-protected details from the UPMS image. Finally, the fused spatial details are injected into the UPMS image to generate the final HRMS image. Extensive comparative experiments in RS and FS validated the performance advantages of the proposed algorithm. A parameter analysis and ablation study further confirmed its effectiveness by demonstrating superior fusion results.
In the multimodal texture correction model, iterative optimization conducted on two-dimensional images significantly improved the solving efficiency. The three correction prior terms effectively preserved the spatial and spectral information. However, this model still has some drawbacks, as the correction prior terms include unknown parameters that need to be determined through experiments, which potentially consume substantial computational resources and time. In the adaptive edge detail fusion model, to obtain accurate spatial information, both the edge detail information from the T C and UPMS images are comprehensively considered. However, issues such as the mismatch between the spatial information injected into the UPMS images and their spectral information still persist. Therefore, our future work will focus on adaptively determining other unknown parameters in the pansharpening model and exploring more suitable injection model methods to enhance the overall performance and efficiency.

Author Contributions

Conceptualization, E.W.; methodology, E.W.; validation, J.W. and L.D.; formal analysis, E.W.; investigation, J.W. and L.D.; writing—original draft preparation, E.W.; writing—review and editing, D.L. and J.A.B.; visualization, E.W.; supervision, D.L. and J.A.B.; project administration, D.L.; funding acquisition, L.W. All authors read and agreed to the published version of this manuscript.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities under nos. 04442024040 and 04442024041. This work was supported in part by the National Natural Science Foundation of China under grant 62071084.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors sincerely thank the academic editors and reviewers for their useful comments and constructive suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, K.; Zhang, F.; Wan, W.; Yu, H.; Sun, J.; Del Ser, J.; Elyan, E.; Hussain, A. Panchromatic and multispectral image fusion for remote sensing and earth observation: Concepts, taxonomy, literature review, evaluation methodologies and challenges ahead. Inf. Fusion 2023, 93, 227–242. [Google Scholar] [CrossRef]
  2. Meng, X.; Shen, H.; Li, H.; Zhang, L.; Fu, R. Review of the pansharpening methods for remote sensing images based on the idea of meta-analysis: Practical discussion and challenges. Inf. Fusion 2019, 46, 102–113. [Google Scholar] [CrossRef]
  3. Vivone, G.; Alparone, L.; Chanussot, J.; Dalla Mura, M.; Garzelli, A.; Licciardi, G.A.; Restaino, R.; Wald, L. A critical comparison among pansharpening algorithms. IEEE Trans. Geosci. Remote Sens. 2014, 53, 2565–2586. [Google Scholar] [CrossRef]
  4. Choi, Y. A Novel Multimodal Image Fusion Method Using Hybrid Wavelet-Based Contourlet Transform. Ph.D. Thesis, University of Nevada, Reno, NV, USA, 2014. [Google Scholar]
  5. Vivone, G.; Dalla Mura, M.; Garzelli, A.; Restaino, R.; Scarpa, G.; Ulfarsson, M.O.; Alparone, L.; Chanussot, J. A new benchmark based on recent advances in multispectral pansharpening: Revisiting pansharpening with classical and emerging pansharpening methods. IEEE Geosci. Remote Sens. Mag. 2020, 9, 53–81. [Google Scholar] [CrossRef]
  6. El-Mezouar, M.C.; Taleb, N.; Kpalma, K.; Ronsin, J. An IHS-based fusion for color distortion reduction and vegetation enhancement in IKONOS imagery. IEEE Trans. Geosci. Remote Sens. 2010, 49, 1590–1602. [Google Scholar] [CrossRef]
  7. Shahdoosti, H.R.; Ghassemian, H. Combining the spectral PCA and spatial PCA fusion methods by an optimal filter. Inf. Fusion 2016, 27, 150–160. [Google Scholar] [CrossRef]
  8. Choi, J.; Yu, K.; Kim, Y. A new adaptive component-substitution-based satellite image fusion by using partial replacement. IEEE Trans. Geosci. Remote Sens. 2010, 49, 295–309. [Google Scholar] [CrossRef]
  9. Garzelli, A.; Nencini, F.; Capobianco, L. Optimal MMSE pan sharpening of very high resolution multispectral images. IEEE Trans. Geosci. Remote Sens. 2007, 46, 228–236. [Google Scholar] [CrossRef]
  10. Vivone, G.; Restaino, R.; Chanussot, J. Full scale regression-based injection coefficients for panchromatic sharpening. IEEE Trans. Image Process. 2018, 27, 3418–3431. [Google Scholar] [CrossRef]
  11. Cheng, J.; Liu, H.; Liu, T.; Wang, F.; Li, H. Remote sensing image fusion via wavelet transform and sparse representation. ISPRS J. Photogramm. Remote Sens. 2015, 104, 158–173. [Google Scholar] [CrossRef]
  12. Chun-Man, Y.; Bao-Long, G.; Meng, Y. Fast algorithm for nonsubsampled contourlet transform. Acta Autom. Sin. 2014, 40, 757–762. [Google Scholar]
  13. Moonon, A.-U.; Hu, J.; Li, S. Remote sensing image fusion method based on nonsubsampled shearlet transform and sparse representation. Sens. Imaging 2015, 16, 23. [Google Scholar] [CrossRef]
  14. Fang, F.; Li, F.; Shen, C.; Zhang, G. A variational approach for pan-sharpening. IEEE Trans. Image Process. 2013, 22, 2822–2834. [Google Scholar] [CrossRef]
  15. Palsson, F.; Sveinsson, J.R.; Ulfarsson, M.O. A new pansharpening algorithm based on total variation. IEEE Geosci. Remote Sens. Lett. 2013, 11, 318–322. [Google Scholar] [CrossRef]
  16. Wang, T.; Fang, F.; Li, F.; Zhang, G. High-quality Bayesian pansharpening. IEEE Trans. Image Process. 2018, 28, 227–239. [Google Scholar] [CrossRef]
  17. Ghahremani, M.; Liu, Y.; Yuen, P.; Behera, A. Remote sensing image fusion via compressive sensing. ISPRS J. Photogramm. Remote Sens. 2019, 152, 34–48. [Google Scholar] [CrossRef]
  18. Masi, G.; Cozzolino, D.; Verdoliva, L.; Scarpa, G. Pansharpening by convolutional neural networks. Remote Sens. 2016, 8, 594. [Google Scholar] [CrossRef]
  19. Scarpa, G.; Vitale, S.; Cozzolino, D. Target-adaptive CNN-based pansharpening. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5443–5457. [Google Scholar] [CrossRef]
  20. Wang, Z.; Ma, Y.; Zhang, Y. Review of pixel-level remote sensing image fusion based on deep learning. Inf. Fusion 2023, 90, 36–58. [Google Scholar] [CrossRef]
  21. Javan, F.D.; Samadzadegan, F.; Mehravar, S.; Toosi, A.; Khatami, R.; Stein, A. A review of image fusion techniques for pan-sharpening of high-resolution satellite imagery. ISPRS J. Photogramm. Remote Sens. 2021, 171, 101–117. [Google Scholar] [CrossRef]
  22. Wu, Z.-C.; Huang, T.-Z.; Deng, L.-J.; Vivone, G.; Miao, J.-Q.; Hu, J.-F.; Zhao, X.-L. A new variational approach based on proximal deep injection and gradient intensity similarity for spatio-spectral image fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6277–6290. [Google Scholar] [CrossRef]
  23. Liu, P.; Liu, J.; Xiao, L. A unified pansharpening method with structure tensor driven spatial consistency and deep plug-and-play priors. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5413314. [Google Scholar] [CrossRef]
  24. Lu, H.; Yang, Y.; Huang, S.; Chen, X.; Su, H.; Tu, W. Intensity mixture and band-adaptive detail fusion for pansharpening. Pattern Recognit. 2023, 139, 109434. [Google Scholar] [CrossRef]
  25. Xiao, J.L.; Huang, T.Z.; Deng, L.J.; Wu, Z.C.; Vivone, G. A New Context-Aware Details Injection Fidelity with Adaptive Coefficients Estimation for Variational Pansharpening. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
  26. Ayas, S.; Gormus, E.T.; Ekinci, M. An efficient pan sharpening via texture based dictionary learning and sparse representation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2448–2460. [Google Scholar] [CrossRef]
  27. Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
  28. Wu, C.; Tai, X.-C. Augmented Lagrangian method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models. SIAM J. Imaging Sci. 2010, 3, 300–339. [Google Scholar] [CrossRef]
  29. Kim, D.; Fessler, J.A. Another look at the fast iterative shrinkage/thresholding algorithm (FISTA). SIAM J. Optim. 2018, 28, 223–250. [Google Scholar] [CrossRef]
  30. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 2011, 3, 1–122. [Google Scholar]
  31. Vivone, G.; Addesso, P.; Restaino, R.; Dalla Mura, M.; Chanussot, J. Pansharpening based on deconvolution for multiband filter estimation. IEEE Trans. Geosci. Remote Sens. 2018, 57, 540–553. [Google Scholar] [CrossRef]
  32. Zhang, K.; Zhang, F.; Feng, Z.; Sun, J.; Wu, Q. Fusion of panchromatic and multispectral images using multiscale convolution sparse decomposition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 426–439. [Google Scholar] [CrossRef]
  33. Dosselmann, R.; Yang, X.D. A comprehensive assessment of the structural similarity index. Signal Image Video Process. 2011, 5, 81–91. [Google Scholar] [CrossRef]
  34. Leung, Y.; Liu, J.; Zhang, J. An improved adaptive intensity–hue–saturation method for the fusion of remote sensing images. IEEE Geosci. Remote Sens. Lett. 2013, 11, 985–989. [Google Scholar] [CrossRef]
  35. Lee, J.; Lee, C. Fast and efficient panchromatic sharpening. IEEE Trans. Geosci. Remote Sens. 2009, 48, 155–163. [Google Scholar]
  36. Aiazzi, B.; Alparone, L.; Baronti, S.; Garzelli, A.; Selva, M. MTF-tailored multiscale fusion of high-resolution MS and Pan imagery. Photogramm. Eng. Remote Sens. 2006, 72, 591–596. [Google Scholar] [CrossRef]
  37. Yao, W.; Li, L. A new regression model: Modal linear regression. Scand. J. Stat. 2014, 41, 656–671. [Google Scholar] [CrossRef]
  38. Aiazzi, B.; Baronti, S.; Selva, M. Improving component substitution pansharpening through multivariate regression of MS $+ $ Pan data. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3230–3239. [Google Scholar] [CrossRef]
  39. Ghahremani, M.; Ghassemian, H. Nonlinear IHS: A promising method for pan-sharpening. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1606–1610. [Google Scholar] [CrossRef]
  40. Vivone, G. Robust band-dependent spatial-detail approaches for panchromatic sharpening. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6421–6433. [Google Scholar] [CrossRef]
  41. Deng, L.-J.; Vivone, G.; Jin, C.; Chanussot, J. Detail injection-based deep convolutional neural networks for pansharpening. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6995–7010. [Google Scholar] [CrossRef]
  42. Ranchin, T.; Wald, L. Fusion of high spatial and spectral resolution images: The ARSIS concept and its implementation. Photogramm. Eng. Remote Sens. 2000, 66, 49–61. [Google Scholar]
  43. Lolli, S.; Alparone, L.; Garzelli, A.; Vivone, G. Haze correction for contrast-based multispectral pansharpening. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2255–2259. [Google Scholar] [CrossRef]
  44. Vicinanza, M.R.; Restaino, R.; Vivone, G.; Dalla Mura, M.; Chanussot, J. A pansharpening method based on the sparse representation of injected details. IEEE Geosci. Remote Sens. Lett. 2015, 12, 180–184. [Google Scholar] [CrossRef]
  45. Deng, L.-J.; Vivone, G.; Paoletti, M.E.; Scarpa, G.; He, J.; Zhang, Y.; Chanussot, J.; Plaza, A. Machine learning in pansharpening: A benchmark, from shallow to deep networks. IEEE Geosci. Remote Sens. Mag. 2022, 10, 279–315. [Google Scholar] [CrossRef]
  46. Wald, L.; Ranchin, T.; Mangolini, M. Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images. Photogramm. Eng. Remote Sens. 1997, 63, 691–699. [Google Scholar]
  47. Aiazzi, B.; Baronti, S.; Selva, M.; Alparone, L. Bi-cubic interpolation for shift-free pan-sharpening. ISPRS J. Photogramm. Remote Sens. 2013, 86, 65–76. [Google Scholar] [CrossRef]
  48. Garzelli, A.; Nencini, F. Hypercomplex quality assessment of multi/hyperspectral images. IEEE Geosci. Remote Sens. Lett. 2009, 6, 662–665. [Google Scholar] [CrossRef]
  49. Horé, A.; Ziou, D. Is there a relationship between peak-signal-to-noise ratio and structural similarity index measure? IET Image Process. 2013, 7, 12–24. [Google Scholar] [CrossRef]
  50. Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
  51. Choi, M. A new intensity-hue-saturation fusion approach to image fusion with a tradeoff parameter. IEEE Trans. Geosci. Remote Sens. 2006, 44, 1672–1682. [Google Scholar] [CrossRef]
  52. Renza, D.; Martinez, E.; Arquero, A. A new approach to change detection in multispectral images by means of ERGAS index. IEEE Geosci. Remote Sens. Lett. 2012, 10, 76–80. [Google Scholar] [CrossRef]
  53. Khalaf, A.F.; Owis, M.I.; Yassine, I.A.J.E.S.w.A. A novel technique for cardiac arrhythmia classification using spectral correlation and support vector machines. Expert Syst. Appl. 2015, 42, 8361–8368. [Google Scholar] [CrossRef]
  54. Alparone, L.; Aiazzi, B.; Baronti, S.; Garzelli, A.; Nencini, F.; Selva, M. Multispectral and panchromatic data fusion assessment without reference. Photogramm. Eng. Remote Sens. 2008, 74, 193–200. [Google Scholar] [CrossRef]
Figure 1. The proposed model framework diagram.
Figure 1. The proposed model framework diagram.
Remotesensing 16 02941 g001
Figure 2. Iterative convergence results from the WorldView-3 dataset.
Figure 2. Iterative convergence results from the WorldView-3 dataset.
Remotesensing 16 02941 g002
Figure 3. Subjective evaluation fusion results of the RS images in the QuickBird dataset.
Figure 3. Subjective evaluation fusion results of the RS images in the QuickBird dataset.
Remotesensing 16 02941 g003
Figure 4. Subjective evaluation fusion results of the RS images in the WorldView-2 dataset.
Figure 4. Subjective evaluation fusion results of the RS images in the WorldView-2 dataset.
Remotesensing 16 02941 g004
Figure 5. Subjective evaluation fusion results of the RS images in the WorldView-3 dataset.
Figure 5. Subjective evaluation fusion results of the RS images in the WorldView-3 dataset.
Remotesensing 16 02941 g005
Figure 6. Subjective evaluation fusion results of the FS images in the GaoFen-2 dataset.
Figure 6. Subjective evaluation fusion results of the FS images in the GaoFen-2 dataset.
Remotesensing 16 02941 g006
Figure 7. Subjective evaluation fusion results of the FS images in the WorldView-2 dataset.
Figure 7. Subjective evaluation fusion results of the FS images in the WorldView-2 dataset.
Remotesensing 16 02941 g007
Figure 8. Parameter settings for four different datasets in RS and FS experiments.
Figure 8. Parameter settings for four different datasets in RS and FS experiments.
Remotesensing 16 02941 g008
Figure 9. Subjective evaluation fusion results of different ablation combination models from the WorldView-3 dataset.
Figure 9. Subjective evaluation fusion results of different ablation combination models from the WorldView-3 dataset.
Remotesensing 16 02941 g009
Table 1. Detailed information of the datasets used in this experiment.
Table 1. Detailed information of the datasets used in this experiment.
SatelliteMS BandsExperiment CategorizationSensorSizesResolution (m)
GaoFen-2Blue (B), green (G), red (R), and near infrared (NIR)FSMS 128 × 128 × 4 4
PAN 512 × 512 1
QuickBirdRSMS 64 × 64 × 4 2.44
PAN 256 × 256 0.61
WorldView-2Coastal blue, B, G, yellow, R, red edge, NIR1, and NIR2RS/FSMS 64 × 64 × 8 / 128 × 128 × 8 2
PAN 256 × 256 / 512 × 512 0.5
WorldView-3RSMS 64 × 64 × 8 1.24
PAN 256 × 256 0.31
Table 2. Objective evaluation fusion results of the RS images in the QuickBird dataset.
Table 2. Objective evaluation fusion results of the RS images in the QuickBird dataset.
MethodsQ4 (1)PSNR (+∞)UIQI (1)RASE (0)RMSE (0)ERGAS (0)SCC (1)CC (1)SSIM (1)Time (s)
GSA0.720428.08640.868045.910782.097411.92790.83840.89420.84590.09
NIHS0.735930.39360.838937.287664.85279.25020.78840.87900.81260.02
BDSD-PC0.778731.02440.872834.358960.00128.87120.82410.89290.85050.11
FusionNet0.766130.18920.905236.943265.21707.58230.83790.90200.88430.47
ATWT-M30.748830.33540.840637.563465.36329.26360.81730.87470.81990.12
BT-H0.724228.91580.892842.602275.34408.27480.85880.90720.87580.03
SR-D0.781631.03400.877234.222659.86198.46130.80130.88720.85150.69
DMPIF0.662930.29390.890436.498064.54389.31220.84850.85720.85444.14
CDIF0.842632.02660.913331.025853.84227.59310.70770.90280.892032.31
A-PNN0.831531.56540.907132.501656.56818.05060.77750.89050.88400.24
Proposed0.857932.55240.927228.534150.02247.02730.85950.91780.91230.66
Table 3. Objective evaluation fusion results of the RS images in the WorldView-2 dataset.
Table 3. Objective evaluation fusion results of the RS images in the WorldView-2 dataset.
MethodsQ8 (1)PSNR (+∞)UIQI (1)RASE (0)RMSE (0)ERGAS (0)SCC (1)CC (1)SSIM (1)Time (s)
GSA0.815424.54760.888623.8290126.73655.82110.90660.91400.88390.04
NIHS0.871826.53310.943219.2262101.72114.70720.89830.91670.93630.01
BDSD-PC0.848425.57580.934021.0005112.02795.37390.86750.90950.92470.10
FusionNet0.897926.89270.955518.078696.38304.49730.89720.91940.94890.41
ATWT-M30.826225.11000.923422.9734120.82185.55930.85540.89360.91040.25
BT-H0.883624.88750.959722.0082119.01804.71400.92110.93000.95430.08
SR-D0.847525.44010.934721.4360114.12705.22810.80420.89720.92200.97
DMPIF0.891027.19570.957517.066091.80094.20160.90540.92170.95074.47
CDIF0.840724.91590.932122.7995121.32685.56700.63840.88780.915732.67
A-PNN0.914927.77840.961716.214086.66904.00000.91430.92620.95620.19
Proposed0.948329.31020.973213.290371.76323.31090.94120.93890.96950.67
Table 4. Objective evaluation fusion results of the RS images in the WorldView-3 dataset.
Table 4. Objective evaluation fusion results of the RS images in the WorldView-3 dataset.
MethodsQ8 (1)PSNR (+∞)UIQI (1)RASE (0)RMSE (0)ERGAS (0)SCC (1)CC (1)SSIM (1)Time (s)
GSA0.875131.56990.931914.293658.25753.34190.92110.93770.92610.04
NIHS0.783929.82100.897817.832172.32064.15530.86910.91350.88650.01
BDSD-PC0.818530.33030.920316.176766.45023.98880.89980.92840.91190.10
FusionNet0.889731.64410.951713.422855.90613.26040.90470.93570.94420.66
ATWT-M30.802529.62950.892818.754975.33224.31150.86400.90680.87940.42
BT-H0.803228.00270.948620.318784.67574.24680.92190.93880.94040.06
SR-D0.817829.93020.910317.251970.47184.03840.84340.90540.89621.13
DMPIF0.868431.80530.951113.266055.06383.13580.92790.93570.94254.64
CDIF0.857330.55370.929415.966265.35483.75050.79000.91300.916936.70
A-PNN0.893731.04370.938614.166959.11473.45080.89450.92680.92910.30
Proposed0.920632.85890.957911.577848.22692.81340.93080.94700.95291.03
Table 5. Objective evaluation fusion results of the FS images in the GaoFen-2 dataset.
Table 5. Objective evaluation fusion results of the FS images in the GaoFen-2 dataset.
Methods D λ (0) D s (0)QNR (1)Time (s)
GSA0.20930.14560.67560.12
NIHS0.00470.11320.88260.03
BDSD-PC0.00670.11280.88120.15
FusionNet0.08910.07740.84032.58
ATWT-M30.00760.15040.84310.89
BT-H0.14340.15040.72780.10
SR-D0.00920.11680.87512.11
DMPIF0.06930.09700.840417.31
CDIF0.02270.05900.9196120.00
A-PNN0.11770.12260.77411.07
Proposed0.02630.05160.92343.29
Table 6. Objective evaluation fusion results of the FS images in the WorldView-2 dataset.
Table 6. Objective evaluation fusion results of the FS images in the WorldView-2 dataset.
Methods D λ (0) D s (0)QNR (1)Time (s)
GSA0.12080.14890.85110.52
NIHS0.00040.07020.92980.10
BDSD-PC0.00280.07780.92220.96
FusionNet0.02240.08050.91951.32
ATWT-M30.00720.08570.91431.68
BT-H0.05980.07950.86540.09
SR-D0.05930.05960.88474.00
DMPIF0.01120.08890.911125.40
CDIF0.01560.07670.9233140.74
A-PNN0.03200.11210.88790.58
Proposed0.00070.05600.94334.64
Table 7. Objective evaluation fusion results of different ablation combination models from the WorldView-3 dataset.
Table 7. Objective evaluation fusion results of different ablation combination models from the WorldView-3 dataset.
ModelsMTCAEDFQ8 (1)PSNR (+∞)UIQI (1)RASE (0)ERGAS (0)SCC (1)
α = 0 β = 0 γ = 0 δ = 0 θ = 0
1×0.825228.78420.904417.86154.52000.8993
2××0.842429.39840.909316.89164.16400.8913
3××××0.853929.45610.916916.57784.17850.9018
4××××××0.878730.66560.932614.54023.64590.9115
5×××××0.920632.85890.957911.57782.81340.9308
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, D.; Wang, E.; Wang, L.; Benediktsson, J.A.; Wang, J.; Deng, L. Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion. Remote Sens. 2024, 16, 2941. https://doi.org/10.3390/rs16162941

AMA Style

Liu D, Wang E, Wang L, Benediktsson JA, Wang J, Deng L. Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion. Remote Sensing. 2024; 16(16):2941. https://doi.org/10.3390/rs16162941

Chicago/Turabian Style

Liu, Danfeng, Enyuan Wang, Liguo Wang, Jón Atli Benediktsson, Jianyu Wang, and Lei Deng. 2024. "Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion" Remote Sensing 16, no. 16: 2941. https://doi.org/10.3390/rs16162941

APA Style

Liu, D., Wang, E., Wang, L., Benediktsson, J. A., Wang, J., & Deng, L. (2024). Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion. Remote Sensing, 16(16), 2941. https://doi.org/10.3390/rs16162941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop