Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion

Liu, Danfeng; Wang, Enyuan; Wang, Liguo; Benediktsson, Jón Atli; Wang, Jianyu; Deng, Lei

doi:10.3390/rs16162941

Open AccessArticle

Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion

by

Danfeng Liu

^1,*,

Enyuan Wang

¹,

Liguo Wang

¹,

Jón Atli Benediktsson

²

,

Jianyu Wang

¹ and

Lei Deng

¹

College of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China

²

Faculty of Electrical and Computer Engineering, University of Iceland, 107 Reykjavik, Iceland

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(16), 2941; https://doi.org/10.3390/rs16162941

Submission received: 30 June 2024 / Revised: 27 July 2024 / Accepted: 8 August 2024 / Published: 11 August 2024

(This article belongs to the Special Issue Deep Neural Networks for Hyperspectral Remote Sensing Image Processing)

Download

Browse Figures

Versions Notes

Abstract

Pansharpening refers to the process of fusing multispectral (MS) images with panchromatic (PAN) images to obtain high-resolution multispectral (HRMS) images. However, due to the low correlation and similarity between MS and PAN images, as well as inaccuracies in spatial information injection, HRMS images often suffer from significant spectral and spatial distortions. To address these issues, a pansharpening method based on multimodal texture correction and adaptive edge detail fusion is proposed in this paper. To obtain a texture-corrected (

T_{C}

) image that is highly correlated and similar to the MS image, the target-adaptive CNN-based pansharpening (A-PNN) method is introduced. By constructing a multimodal texture correction model, intensity, gradient, and A-PNN-based deep plug-and-play correction constraints are established between the

T_{C}

and source images. Additionally, an adaptive degradation filter algorithm is proposed to ensure the accuracy of these constraints. Since the

T_{C}

image obtained can effectively replace the PAN image and considering that the MS image contains valuable spatial information, an adaptive edge detail fusion algorithm is also proposed. This algorithm adaptively extracts detailed information from the

T_{C}

and MS images to apply edge protection. Given the limited spatial information in the MS image, its spatial information is proportionally enhanced before the adaptive fusion. The fused spatial information is then injected into the upsampled multispectral (UPMS) image to produce the final HRMS image. Extensive experimental results demonstrated that compared with other methods, the proposed algorithm achieved superior results in terms of both subjective visual effects and objective evaluation metrics.

Keywords:

pansharpening; multimodal texture correction; adaptive edge detail fusion

1. Introduction

In recent decades, with the launch of an increasing number of remote sensing satellites, the imagery data captured by these satellites has been widely utilized in fields such as urban planning, climate monitoring, and agriculture, leading to rapid advancements in remote sensing technology [1]. However, due to limitations in satellite imaging sensor hardware, it is difficult to simultaneously obtain both a high spatial resolution and high spectral resolution in multispectral images. Multispectral (MS) images provide rich spectral information but with a lower spatial resolution, whereas panchromatic (PAN) images offer a higher spatial resolution but poorer spectral information [2]. Therefore, employing pansharpening techniques to enhance the spatial resolution of low-resolution multispectral (LRMS) images involves fusing LRMS and PAN images, and thus, leveraging their respective strengths to achieve high-resolution multispectral (HRMS) images [3]. The HRMS images obtained through pansharpening can be widely used in fields such as crop classification, target extraction, ecological monitoring, and geological research. This makes pansharpening technology an indispensable part of the production and application of remote sensing images. Therefore, conducting in-depth research on pansharpening technology has extensive application prospects and practical significance [4].

Due to the rapid development of pansharpening techniques in recent years, they can be categorized into four main types: component substitution (CS)-based methods, multi-resolution analysis (MRA)-based methods, variational optimization (VO)-based methods, and deep learning (DL)-based methods [5]. CS-based methods involve separating the upsampled multispectral (UPMS) image into spatial and spectral components, then substituting the spatial components with information from the PAN image. Common classical CS-based methods include intensity–hue–saturation (IHS) [6], principal component analysis (PCA) [7], partial replacement adaptive component substitution (PRACS) [8], and band-dependent spatial detail (BDSD) [9]. CS-based methods typically preserve spatial details well, thus achieving high spatial quality, and are straightforward to implement with high computational efficiency. However, they can suffer from significant spectral distortions.

MRA-based methods decompose UPMS and PAN images into multiple spatial scales and perform fusion separately at each scale. Common classical MRA-based methods include the generalized Laplacian pyramid (GLP) [10], wavelet transform (WT) [11], nonsubsampled contourlet transform (NSCT) [12], and nonsubsampled shearlet transform (NSST) [13]. While MRA-based methods tend to preserve spectral information effectively, they may introduce spatial distortions due to the decomposition of spatial structures.

VO-based methods apply spectral and spatial prior constraints between the MS, PAN, and ideal HRMS images to construct a reasonable degradation model and perform optimization solving. VO-based methods can be categorized into model-based optimization, regularization-based optimization, Bayesian-based optimization, and sparse representation-based methods. Examples of classical methods for constructing VO-based models include a variational approach for pan-sharpening [14], total variation (TV) [15], high-quality Bayesian pansharpening [16], and remote sensing image fusion via compressive sensing [17]. VO-based methods typically preserve spatial and spectral information better than CS- and MRA-based methods, which results in superior fusion outcomes. However, making unreasonable model assumptions can lead to unpredictable deviations. Therefore, these methods require accurate mathematical modeling and further efficiency improvements.

DL-based methods have gained widespread application in fields such as pansharpening due to their strong feature extraction and nonlinear learning capabilities. For instance, Masi et al. introduced pansharpening by convolutional neural networks (PNNs) [18], which was the first framework to integrate CNNs into the pansharpening process with a simple three-layer network. Scarpa et al., proposed target-adaptive CNN-based pansharpening (A-PNN) [19], which adapts to different sensor inputs during training and results in a better performance and faster training compared with the PNN method. Generally, DL-based methods can achieve excellent fusion results. However, they require a large amount of training data to optimize the network parameters and significant computational resources for training. Moreover, DL-based models trained on specific datasets tend to perform well only on images that closely match the training data characteristics. Once trained, the network parameters are fixed, which limits their adaptability to new datasets from different sensors. As a result, DL-based methods may struggle to further improve their accuracy without retraining on new data.

Currently, many pansharpening methods face the problem of low correlation and similarity between MS and PAN images, which results in the inaccurate extraction of spatial details and even relying solely on PAN images for detail extraction. During the fusion process, it is challenging to balance the spectral and spatial information, which leads to spatial and spectral distortions in the fused image. Although DL-based methods can be applied to balance spectral and spatial information, supervised training networks, for example, are only suitable for the test dataset. Frequent training on different datasets can sharply increase the training time and costs. To address these issues, a multimodal texture correction and adaptive edge detail fusion model is proposed. The main contributions of this paper are as follows:

(1): To enhance the correlation and similarity between source images, a multimodal texture correction model is proposed. This model takes the intensity component of the LRMS image ( $I_{0}$ ), the PAN image, and the intensity component of the image fused using A-PNN ( $I_{n e t}$ ) as inputs, and outputs a texture-corrected ( $T_{C}$ ) image. The model applies intensity correction constraints between the $T_{C}$ and $I_{0}$ images; gradient correction constraints between the $T_{C}$ , $I_{0}$ , and PAN images; and an A-PNN-based deep plug-and-play correction prior between the $T_{C}$ and $I_{n e t}$ images.
(2): Due to the difficulty in determining the degradation filter in intensity correction constraints, an adaptive degradation filter algorithm is proposed to ensure the accuracy of each constraint prior. This algorithm adaptively determines the degradation filter in the model, thereby enhancing the correlation and similarity between $T_{C}$ and the source images within the multimodal texture correction model.
(3): To achieve accurate spatial information injection, an adaptive edge detail fusion model is proposed. This model adaptively extracts the detail information from $T_{C}$ and applies edge protection; similarly, it extracts the detail information from the UPMS image and applies edge protection. The spatial information of the UPMS image is then elevated to the same level as $T_{C}$ . Finally, the spatial information of the $T_{C}$ and UPMS images is adaptively fused to obtain more accurate spatial information. Extensive experiments were conducted on four datasets in this paper. The subjective and objective evaluation fusion results demonstrate that the proposed algorithm achieved superior performance compared with the other methods while also maintaining high operational efficiency.

The rest of this paper is organized as follows. Section 2 describes the related work. Section 3 provides a detailed introduction to the proposed model. Section 4 presents and analyzes the experimental results of two comparative experiments, namely, simulation and real experiments, conducted on four datasets. Section 5 concludes this paper and suggests directions for future research.

2. Related Works

2.1. Injection Model

Injection models are commonly used in pansharpening methods. The main idea is to inject the high spatial resolution details from the PAN image into the original UPMS image, which has a high spectral resolution, to generate an HRMS image. This approach addresses the issue of LRMS images lacking significant spatial information [20]. Let us assume the dimensions of the LRMS image are

L \times W \times B

(i.e., length

\times

width

\times

number of bands), and the dimensions of the PAN image are

L^{'} \times W^{'}

, where

L^{'} = L / r

and

W^{'} = W / r

, with

r

representing the reduction ratio. Consequently, the dimensions of both the UPMS and HRMS images are

L^{'} \times W^{'} \times B

. The specific formula for the injection model can be uniformly expressed as

M_{H R} = M_{U P} + G S_{D}

(1)

where

M_{H R}

represents the HRMS image,

M_{U P}

represents the UPMS image,

G

represents the injected gain, and

S_{D}

represents the injected spatial detail information. The methods for extracting

S_{D}

can be uniformly divided into CS- and MRA-based methods. For CS-based methods,

S_{D}

can be extracted using the following formula:

S_{D} = P_{I} - I_{U P}

(2)

where

P_{I}

represents the image obtained by histogram matching the intensity component of the UPMS (

I_{U P}

) image to the PAN image. Histogram matching ensures that the intensity and contrast of the PAN and LRMS images are within the same grayscale range, which ensures the accuracy of the spatial information extraction. The formula for

P_{I}

is as follows:

P_{I} = \frac{σ_{I}}{σ_{P}} (P - μ_{P}) + μ_{I}

(3)

where

P

represents the original PAN image;

μ_{P}

and

μ_{I}

denote the mean values of the

P

and

I_{U P}

images, respectively; and

σ_{P}

and

σ_{I}

denote the variances of the

P

and

I_{U P}

images, respectively.

I_{U P}

is obtained from

M_{U P}

through the linear weighting of each band, with the formula as follows:

I_{U P} = \sum_{i = 1}^{B} ω_{i} M_{U P}^{i}

(4)

where

ω

denotes the linear weighting coefficient, and the superscript or subscript i indicates the i-th band of the image. For MRA-based methods,

S_{D}

can be extracted using the following formula:

\begin{array}{l} S_{D} = P - P_{D} \\ P_{D} = H_{L P} P \end{array}

(5)

where

P_{D}

represents the degraded PAN image, which can be obtained by applying a low-pass filter

H_{L P}

to

P

.

H_{L P}

introduces a blur effect to

P

. However, issues such as inaccurate injected spatial detail information still persist. Since the missing spatial detail information in the LRMS image is generally inferred from the PAN image, inaccuracies in this inference and potential mismatches in spectral information during the fusion process can prevent the simultaneous preservation of accurate spectral fidelity and spatial fidelity. This leads to spectral and spatial distortions in the fused image.

2.2. VO-Based Model

VO-based methods have become increasingly popular in recent years. They can establish mathematical models to ensure the accuracy of both the spectral and spatial information in images [21]. The established mathematical model can be viewed as a degradation model, where the ideal HRMS image after fusion is restored from LRMS and PAN images. This process can be seen as the inverse of the degradation of the ideal HRMS image into the source images. Therefore, VO-based methods can preserve spatial and spectral information of LRMS and PAN images through various optimization algorithms, and thus, combine them into the desired ideal HRMS image. Many researchers proposed various VO-based methods. Wu et al. proposed a new variational approach based on proximal deep injection and gradient intensity similarity for spatial–spectral image fusion (DMPIF) [22], which integrates deep convolutional neural networks into the VO-based framework to enhance the performance. Liu et al., introduced a unified pansharpening method with structure tensor-driven spatial consistency and a deep plug-and-play prior [23], which utilizes novel spatial consistency and deep plug-and-play consistency methods to better preserve spatial information while maintaining spectral fidelity, thereby improving the fusion quality. Lu et al. proposed an intensity mixture and band-adaptive detail fusion for pansharpening [24], which maintains intensity gradient information between the source and generated images and adaptively injects detail information guided by source and generated images to preserve the spectral information. Xiao et al. proposed a new context-aware detail injection fidelity method with adaptive coefficients estimation for variational pansharpening (CDIF) [25], which constructs complex relationships in the gradient domain between the PAN and HRMS images, and thus, effectively extracts the essential features from source images and estimates the adaptive coefficients for the model. Ayas et al. proposed an efficient pansharpening method via texture-based dictionary learning and sparse representation [26], which utilizes dictionary learning and sparse representation methods to generate a compact single dictionary from the texture information of MS images and results in more effective fusion outcomes. Iterative optimization of the constructed VO-based models is typically conducted using common methods, such as gradient descent [27], split Bregman iteration [28], fast iterative shrinkage-thresholding (FISTA) [29], and alternating direction method of multipliers (ADMM) [30]. In summary, VO-based methods generally establish an energy function among the LRMS, PAN, and ideal HRMS images, which can be divided into three terms: the first term is for the spectral fidelity, the second term is for the spatial fidelity, and the third term is for the regularization prior. The specific formulas are as follows:

E (M_{H R}) = f_{s p e c t r a l} (M_{0}, M_{H R}) + f_{s p a t i a l} (P, M_{H R}) + f_{p r i o r} (M_{H R})

(6)

where

M_{0}

is the LRMS image. By applying blurring and downsampling operations to

M_{H R}

,

M_{0}

can be obtained.

M_{H R}

can also be obtained from

P

through a linear weighted combination. Therefore, the energy function in Equation (6) can be simplified to the following commonly used form:

E (M_{H R}) = λ_{1} ‖(D H_{L P} M_{H R} - M_{0})‖ + ‖P - C M_{H R}‖ + λ_{2} f_{p r i o r} (M_{H R})

(7)

where

λ_{1}

and

λ_{2}

are penalty parameters,

D

represents the downsampling matrix, and

C

represents the linear weighted combination matrix. By optimizing the above equation,

M_{H R}

can be obtained. Although VO-based methods can simultaneously preserve relatively accurate spectral and spatial information, they rely on the accuracy of the established mathematical model. An unreasonable VO-based model can overlook the correlation and similarity between the MS and PAN images, leading to mismatched spectral and spatial information, which can result in spectral and spatial distortions in the final HRMS image. Furthermore, the efficiency of most VO-based models in pansharpening methods is also relatively low.

3. Methodology

3.1. The Proposed Model Framework

To address the issues of low correlation and similarity between LRMS, PAN, and HRMS images, as well as the inaccuracy of spatial information injected into UPMS images, a pansharpening method based on multimodal texture correction and adaptive edge detail fusion is proposed. This method can improve the spectral and spatial distortions in HRMS images.

The input of the multimodal texture correction model consists of

I_{0}

, PAN, and

I_{n e t}

images, with

T_{C}

as the output. An intensity correction prior is established to correct the intensity constraint between the

I_{0}

and

T_{C}

images. A gradient correction prior is established to correct the gradient constraint for the

I_{0}

, PAN, and

T_{C}

images. A-PNN-based deep plug-and-play correction prior is established to correct the intensity gradient constraint between the

I_{n e t}

and

T_{C}

images. These three correction priors form the foundation of the multimodal texture correction model. Additionally, a proposed adaptive degradation filter algorithm can be used within the intensity correction prior to accurately obtain an adaptive degradation filter

H_{A}

for degrading

T_{C}

. This ensures that the degraded

T_{C}

maintains the highest correlation and similarity with the

I_{0}

image. Finally, the ADMM is employed to optimize this model and obtain

T_{C}

. Due to the high correlation and similarity between

T_{C}

and the source image, after preserving the spectral information of the LRMS image,

T_{C}

inherits the gradient information from the PAN image, and

I_{n e t}

retains more image features, further ensuring the stability of the texture information. Therefore,

T_{C}

can be used to replace the PAN image for subsequent fusion operations.

In the adaptive edge detail fusion model, spatial detail information exists not only in

T_{C}

, but also partially in the MS image. Therefore, adaptive extraction of the detail information from

T_{C}

occurs and edge protection is applied. Simultaneously, the detail information is extracted from the UPMS image using a modulation transfer function (MTF)-matched Gaussian filter, with edge protection applied. The detail information of the edge-protected UPMS image is enhanced to the level of

T_{C}

and adaptively fused with the detail information of the edge-protected

T_{C}

. This process results in spatial information with a high correlation and similarity to the source image. The spatial information is injected into the UPMS image in appropriate proportions to obtain the final HRMS image. The model framework of this study is illustrated in Figure 1, and specific processes are detailed in Section 3.2 and Section 3.3.

3.2. Multimodal Texture Correction Model

3.2.1. Intensity Correction Prior

As stated in Section 2.2, the spectral fidelity term can be obtained by performing operations of blurring and downsampling the HRMS images to obtain LRMS images, as specified by the following formula:

f_{s p e c t r a l}^{i} = \frac{1}{2} {‖D H M_{H R}^{i} - M_{0}^{i}‖}_{F}^{2}

(8)

where

H

typically represents the Gaussian smoothing filter [31]. To maintain an inherent correlation and similarity between the bands, each band’s LRMS and ideal HRMS images are linearly weighted and summed using Equation (4), which results in

I_{0}

and the intensity component of the ideal HRMS image (

I_{H R}

). The specific formula is

f_{s p e c t r a l 1} = \frac{1}{2} {‖D H \sum_{i = 1}^{B} ω_{i} M_{H R}^{i} - \sum_{i = 1}^{B} ω_{i} M_{0}^{i}‖}_{F}^{2} = \frac{1}{2} {‖ D H I_{H R} - I_{0} ‖}_{F}^{2}

(9)

Since

I_{H R}

is unknown, we assumed that

T_{C}

is close to

I_{H R}

and highly correlated. Therefore, the intensity correction prior term is formulated as follows:

E_{i n t e n s i t y} = \frac{1}{2} {‖D H T_{C} - I_{0}‖}_{F}^{2}

(10)

3.2.2. Gradient Correction Prior

As stated in Section 3.2.1, while

T_{C}

preserves the spectral information invariance, spatial information should also be retained. This is achieved by establishing a spatial fidelity term to preserve the gradient information of the PAN image, with the specific formula as follows:

f_{s p a t i a l 1} = \frac{α}{2} {‖\nabla^{2} T_{C} - \nabla^{2} P‖}_{F}^{2}

(11)

where

α

is the penalty parameter, and

\nabla^{2}

denotes the Laplacian operator. Due to the process where

T_{C}

corrects the gradient information of the PAN image, there may be deviations in the intensity correction between the

T_{C}

and

I_{0}

images. Therefore, it is necessary to establish an additional spatial fidelity term to ensure that the intensity correction prior term remains unbiased to further enhance the correlation and similarity between the

T_{C}

and

I_{0}

images. The specific formula is as follows:

f_{s p a t i a l 2} = \frac{β}{2} {‖\nabla^{2} (D H T_{C}) - \nabla^{2} I_{0}‖}_{F}^{2}

(12)

where

β

is the penalty parameter. In summary, the gradient correction prior term can be represented as follows:

E_{g r a d i e n t} = \frac{α}{2} {‖\nabla^{2} T_{C} - \nabla^{2} P‖}_{F}^{2} + \frac{β}{2} {‖\nabla^{2} (D H T_{C}) - \nabla^{2} I_{0}‖}_{F}^{2}

(13)

3.2.3. A-PNN-Based Deep Plug-and-Play Prior

To generate more texture features, it is necessary to further enhance the correlation and similarity between the

T_{C}

,

I_{0}

, and PAN images, and to preserve more spectral and spatial information. After fusing the PAN and UPMS images using A-PNN, the resulting HRMS image is denoted as

M S_{n e t}

. Applying Equation (4) to

M S_{n e t}

for linear weighting yields its intensity component, which is denoted as

I_{n e t}

. A spectral fidelity term is subsequently established between

T_{C}

and

I_{n e t}

to correct the intensity information of

T_{C}

. The specific formula is as follows:

f_{s p e c t r a l 2} = \frac{γ}{2} {‖T_{C} - I_{n e t}‖}_{F}^{2}

(14)

where

γ

is the penalty parameter. The spatial fidelity term is subsequently established between

T_{C}

and

I_{n e t}

to correct the gradient information of

T_{C}

. The specific formula is

f_{s p a t i a l 3} = \frac{δ}{2} {‖\nabla^{2} T_{C} - \nabla^{2} I_{n e t}‖}_{F}^{2}

(15)

where

δ

is the penalty parameter. In summary, the A-PNN-based deep plug-and-play prior can be expressed as

E_{D P P} = \frac{γ}{2} {‖T_{C} - I_{n e t}‖}_{F}^{2} + \frac{δ}{2} {‖\nabla^{2} T_{C} - \nabla^{2} I_{n e t}‖}_{F}^{2}

(16)

3.2.4. Proposed Model

To ensure the sparsity of the output texture image and reduce the artifacts, in addition to incorporating the intensity correction prior term, gradient correction prior term, and the A-PNN-based deep plug-and-play correction prior term mentioned above, we also employed a TV regularization term. Therefore, in this paper, a multimodal texture correction model is proposed, with the following specific formula:

T_{C} = \underset{T_{C}}{\arg \min} \frac{1}{2} {‖D H T_{C} - I_{0}‖}_{F}^{2} + \frac{α}{2} {‖\nabla^{2} T_{C} - \nabla^{2} P‖}_{F}^{2} + \frac{β}{2} {‖\nabla^{2} (D H T_{C}) - \nabla^{2} I_{0}‖}_{F}^{2} + \frac{γ}{2} {‖T_{C} - I_{n e t}‖}_{F}^{2} + \frac{δ}{2} {‖\nabla^{2} T_{C} - \nabla^{2} I_{n e t}‖}_{F}^{2} + θ {‖\nabla^{2} T_{C}‖}_{1}

(17)

where

θ

is the penalty parameter.

3.2.5. Adaptive Degradation Filter Algorithm

In the model shown in Equation (17), all variables except for

H

and

T_{C}

are determined.

T_{C}

can be determined in Section 3.2.6, while

H

is difficult to determine. Therefore, here we propose an adaptive degradation filter algorithm that uses a Gaussian filter as the degradation filter, which is defined as

H_{A}

and can be determined by

H_{A} = \underset{H_{A}}{\arg \min} \frac{1}{2} {‖D H_{A} T_{C} - I_{0}‖}_{F}^{2}

(18)

From the above equation, it can be seen that when the difference between

D H_{A} T_{C}

and

I_{0}

is minimized, the correlation and similarity between them are maximized, making

H_{A}

the optimal degradation filter. Therefore, this algorithm comprehensively considers both the correlation and similarity, which are measured by the correlation coefficient (CC) [32] and structural similarity index measure (SSIM) [33] indices, respectively, to adaptively determine the optimal degradation filter. When the filter processes the image in the spatial domain, convolution operations significantly increase the computational complexity, and when processing in the frequency domain, convolution operations transform into inner product operations, which reduces the computational complexity. Therefore,

H_{A}

was chosen to operate in the frequency domain, with its frequency domain expression given as follows:

H_{A} (u, v) = e^{- \frac{D_{C}^{2} (u, v)}{2 σ^{2}}}

(19)

where

D_{C} (u, v)

represents the distance from point

(u, v)

to the center of the frequency domain, and

σ

represents the standard deviation. After

H_{A}

is transformed to the frequency domain,

T_{C}

also needs to be computed in the frequency domain. Therefore, the fast Fourier transform (FFT) is used to convert

T_{C}

to the frequency domain, and the inverse fast Fourier transform (IFFT) is used to convert

H_{A} T_{C}

back to the spatial domain. This facilitates the subsequent correlation and similarity calculations between

D H_{A} T_{C}

and

I_{0}

. The specific formula is as follows:

D H_{A} T_{C} = D F^{- 1} (H_{A} (u, v) F (T_{C}))

(20)

where

F (\cdot)

denotes the FFT, and

F^{- 1} (\cdot)

denotes the IFFT. In summary, determining

H_{A}

hinges on identifying the unknown parameter

σ

. Therefore, by correcting the correlation and similarity between

D H_{A} T_{C}

and

I_{0}

, an optimal

σ

can be found. The correlation is measured using the CC, which is denoted as

ρ (D H_{A} T_{C}, I_{0})

. The similarity is measured using the SSIM, which is denoted as

S (D H_{A} T_{C}, I_{0})

. By combining these two metrics using the average rule and iterating with different

σ

values, the final result is obtained by taking the maximum value, which indicates the optimal

σ

and is denoted as

σ_{b e s t}

. The specific formula is as follows:

σ_{b e s t} = \underset{σ}{\arg \max} \frac{ρ (D H_{A} T_{C}, I_{0}) + S (D H_{A} T_{C}, I_{0})}{2}

(21)

In summary, the overall process of the adaptive degradation filter algorithm is illustrated as Algorithm 1.

Algorithm 1. Adaptive degradation filter algorithm.

Input : Texture-corrected image T_{C}

,

I component of LRMS image I_{0}

.

Initialize : Set σ^{(0)} = 1

,

step length s = 0.5

,

iterative step k = 0

;

Transform H_{A}

into frequency domain via (19);

Calculate {(D H_{A} T_{C})}^{(0)}

via (20);

Calculate ρ^{(0)} (D H_{A} T_{C}, I_{0})

and S^{(0)} (D H_{A} T_{C}, I_{0})

;

While do σ^{(k + 1)} = σ^{(k)} + s

,

k = k + 1

;

Optimize {(D H_{A} T_{C})}^{(k + 1)}

via (20);

Optimize ρ^{(k + 1)} (D H_{A} T_{C}, I_{0})

and

S^{(k + 1)} (D H_{A} T_{C}, I_{0})

;

Calculate σ^{(k + 1)} = \frac{ρ^{(k + 1)} (D H_{A} T_{C}, I_{0}) + S^{(k + 1)} (D H_{A} T_{C}, I_{0})}{2}

via (21);

Until σ^{(k + 1)} < σ^{(k)}

;

Output : σ_{b e s t} = σ^{(k)}

,

adaptive degradation filter H_{A}

.

3.2.6. Optimization Model Algorithm

The optimization model algorithm uses ADMM to optimize, which decomposes the original problem into several easier-to-handle subproblems. For the ease of optimization, auxiliary variables are introduced:

A = H_{A} T_{C}

,

C = \nabla^{2} T

, and then

B = D A

. Thus, the model described by Equation (17) can be formulated as follows:

\begin{array}{l} \min_{A, B, C} \frac{1}{2} {‖B - I_{0}‖}_{F}^{2} + \frac{α}{2} {‖\nabla^{2} T_{C} - \nabla^{2} P‖}_{F}^{2} + \frac{β}{2} {‖\nabla^{2} B - \nabla^{2} I_{0}‖}_{F}^{2} + \frac{γ}{2} {‖T_{C} - I_{n e t}‖}_{F}^{2} + \frac{δ}{2} {‖\nabla^{2} T_{C} - \nabla^{2} I_{n e t}‖}_{F}^{2} + θ {‖C‖}_{1} \\ s . t . A = H_{A} T_{C}, B = D A, C = \nabla^{2} T_{C} \end{array}

(22)

The augmented Lagrangian function for the above formula can be expressed as follows:

\begin{array}{l} E_{L} (A, B, T_{C}, H_{A}, C, Λ_{1}, Λ_{2}, Λ_{3}) = \frac{1}{2} {‖B - I_{0}‖}_{F}^{2} + \frac{α}{2} {‖\nabla^{2} T_{C} - \nabla^{2} P‖}_{F}^{2} + \frac{β}{2} {‖\nabla^{2} B - \nabla^{2} I_{0}‖}_{F}^{2} + \frac{γ}{2} {‖T_{C} - I_{n e t}‖}_{F}^{2} + \frac{δ}{2} {‖\nabla^{2} T_{C} - \nabla^{2} I_{n e t}‖}_{F}^{2} \\ + θ {‖C‖}_{1} + Λ_{1}^{T} (A - H_{A} T_{C}) + Λ_{2}^{T} (B - D A) + Λ_{3}^{T} (C - \nabla^{2} T_{C}) + \frac{μ_{1}}{2} {‖A - H_{A} T_{C}‖}_{F}^{2} + \frac{μ_{2}}{2} {‖B - D A‖}_{F}^{2} + \frac{μ_{3}}{2} {‖C - \nabla^{2} T_{C}‖}_{F}^{2} \end{array}

(23)

where

Λ_{1}

,

Λ_{2}

, and

Λ_{3}

are Lagrange multipliers, and

μ_{1}

,

μ_{2}

, and

μ_{3}

are penalty parameters. To minimize the energy function in the above equation, iterative optimization is performed for

A^{(k + 1)}

,

B^{(k + 1)}

,

T_{C}^{(k + 1)}

,

H_{A}^{(k + 1)}

,

C^{(k + 1)}

,

Λ_{1}^{(k + 1)}

,

Λ_{2}^{(k + 1)}

, and

Λ_{3}^{(k + 1)}

until convergence, and thus, the final value of

T_{C}

is obtained, where

k

denotes the iteration count. The specific optimization process is shown below:

(1): Optimization for $A^{(k + 1)}$

Fixing the other variables, the subproblem for

A^{(k + 1)}

is as follows:

A^{(k + 1)} = \underset{A}{\arg \min} {(Λ_{1}^{(k)})}^{T} (A^{(k)} - H_{A}^{(k)} T_{C}^{(k)}) + {(Λ_{2}^{(k)})}^{T} (B^{(k)} - D A^{(k)}) + \frac{μ_{1}}{2} {‖A^{(k)} - H_{A}^{(k)} T_{C}^{(k)}‖}_{F}^{2} + \frac{μ_{2}}{2} {‖B^{(k)} - D A^{(k)}‖}_{F}^{2}

(24)

Setting the deviation of

A^{(k + 1)}

to zero, i.e.,

\partial E_{L} / \partial A^{(k + 1)} = 0

,

A^{(k + 1)}

is determined by the following equation:

A^{(k + 1)} = \frac{- Λ_{1}^{(k)} + D^{T} Λ_{2}^{(k)} + μ_{1} H_{A}^{(k)} T_{C}^{(k)} + μ_{2} D^{T} B^{(k)}}{μ_{1} U + μ_{2} D^{T} D}

(25)

where

U

represents the identity matrix, and the superscript

T

denotes the transpose operator.

(2): Optimization for $B^{(k + 1)}$

Fixing the other variables, the subproblem for

B^{(k + 1)}

is as follows:

B^{(k + 1)} = \underset{B}{\arg \min} \frac{1}{2} {‖B^{(k)} - I_{0}‖}_{F}^{2} + \frac{β}{2} {‖\nabla^{2} B^{(k)} - \nabla^{2} I_{0}‖}_{F}^{2} + {(Λ_{2}^{(k)})}^{T} (B^{(k)} - D A^{(k + 1)}) + \frac{μ_{2}}{2} {‖B^{(k)} - D A^{(k + 1)}‖}_{F}^{2}

(26)

The deviation of

B^{(k + 1)}

is set to zero, i.e.,

\partial E_{L} / \partial B^{(k + 1)} = 0

; however, due to the presence of the Laplacian operator, this increases the computational complexity during the solving process. To enhance the computational efficiency, FFT and IFFT are employed, which facilitates rapid calculations in the frequency domain before transforming back to the spatial domain. Therefore, after optimizing

A^{(k + 1)}

,

B^{(k + 1)}

can be determined by the following equation:

B^{(k + 1)} = F^{- 1} (\frac{F (I_{0} + β {(\nabla^{2})}^{T} \nabla^{2} I_{0} - Λ_{2}^{(k)} + μ_{2} D A^{(k + 1)})}{F ((1 + μ_{2}) U + β {(\nabla^{2})}^{T} \nabla^{2})})

(27)

(3): Optimization for $T_{C}^{(k + 1)}$

Fixing the other variables, the subproblem for

T_{C}^{(k + 1)}

is as follows:

\begin{array}{l} T_{C}^{(k + 1)} = \underset{T_{C}}{\arg \min} \frac{α}{2} {‖\nabla^{2} T_{C}^{(k)} - \nabla^{2} P‖}_{F}^{2} + \frac{γ}{2} {‖T_{C}^{(k)} - I_{n e t}‖}_{F}^{2} + \frac{δ}{2} {‖\nabla^{2} T_{C}^{(k)} - \nabla^{2} I_{n e t}‖}_{F}^{2} + {(Λ_{1}^{(k)})}^{T} (A^{(k + 1)} - H_{A}^{(k)} T_{C}^{(k)}) \\ + {(Λ_{3}^{(k)})}^{T} (C^{(k)} - \nabla^{2} T_{C}^{(k)}) + \frac{μ_{1}}{2} {‖A^{(k + 1)} - H_{A}^{(k)} T_{C}^{(k)}‖}_{F}^{2} + \frac{μ_{3}}{2} {‖C^{(k)} - \nabla^{2} T_{C}^{(k)}‖}_{F}^{2} \end{array}

(28)

The deviation of

T_{C}^{(k + 1)}

is set to zero, i.e.,

\partial E_{L} / \partial T_{C}^{(k + 1)} = 0

; due to the presence of the Laplacian operator, FFT and IFFT are also employed for solving. Therefore, after optimizing

A^{(k + 1)}

,

T_{C}^{(k + 1)}

can be determined by the following equation:

T_{C}^{(k + 1)} = F^{- 1} (\frac{F (α {(\nabla^{2})}^{T} \nabla^{2} P + γ I_{n e t} + δ {(\nabla^{2})}^{T} \nabla^{2} I_{n e t} + {(H_{A}^{(k)})}^{T} Λ_{1}^{(k)} + {(\nabla^{2})}^{T} Λ_{3}^{(k)} + μ_{1} {(H_{A}^{(k)})}^{T} A^{(k + 1)} + μ_{3} {(\nabla^{2})}^{T} C^{(k)})}{F ((α + δ + μ_{3}) {(\nabla^{2})}^{T} \nabla^{2} + γ + μ_{1} {(H_{A}^{(k)})}^{T} H_{A}^{(k)})})

(29)

(4): Optimization for $C^{(k + 1)}$

Fixing the other variables, the subproblem for

C^{(k + 1)}

is as follows:

C^{(k + 1)} = \underset{C}{\arg \min} θ {‖C^{(k)}‖}_{1} + {(Λ_{3}^{(k)})}^{T} (C^{(k)} - \nabla^{2} T_{C}^{(k + 1)}) + \frac{μ_{3}}{2} {‖C^{(k)} - \nabla^{2} T_{C}^{(k + 1)}‖}_{F}^{2} = \underset{C}{\arg \min} \frac{θ}{μ_{3}} {‖C^{(k)}‖}_{1} + \frac{1}{2} {‖C^{(k)} - (\nabla^{2} T_{C}^{(k + 1)} - \frac{Λ_{3}^{(k)}}{μ_{3}})‖}_{F}^{2}

(30)

Further simplifying using the soft thresholding (

S_{T}

) formula yields the following equation:

C^{(k + 1)} = S_{T} (\nabla^{2} T_{C}^{(k + 1)} - \frac{Λ_{3}^{(k)}}{μ_{3}}, \frac{θ}{μ_{3}}) = sgn (\nabla^{2} T_{C}^{(k + 1)} - \frac{Λ_{3}^{(k)}}{μ_{3}}) \max (|\nabla^{2} T_{C}^{(k + 1)} - \frac{Λ_{3}^{(k)}}{μ_{3}}| - \frac{θ}{μ_{3}}, 0)

(31)

where

sgn (\cdot)

is the sign function, and

\max (\cdot)

is the maximum function.

(5): Optimization for $Λ_{1}^{(k + 1)}$ , $Λ_{2}^{(k + 1)}$ , and $Λ_{3}^{(k + 1)}$

Fixing the other variables, the subproblems for

Λ_{1}^{(k + 1)}

,

Λ_{2}^{(k + 1)}

, and

Λ_{3}^{(k + 1)}

are obtained through the gradient ascent method:

\{\begin{cases} Λ_{1}^{(k + 1)} = Λ_{1}^{(k)} + φ^{(k + 1)} (A^{(k + 1)} - H_{A}^{(k + 1)} T_{C}^{(k + 1)}) \\ Λ_{2}^{(k + 1)} = Λ_{2}^{(k)} + φ^{(k + 1)} (B^{(k + 1)} - D A^{(k + 1)}) \\ Λ_{3}^{(k + 1)} = Λ_{3}^{(k)} + φ^{(k + 1)} (C^{(k + 1)} - \nabla^{2} T_{C}^{(k + 1)}) \end{cases}

(32)

where

φ

represents the step length required for the gradient ascent, as given by the following formula:

φ^{(k + 1)} = τ φ^{(k)}

(33)

where

τ

is a penalty parameter, with the condition

τ > 1

, which accelerates the convergence rate. In summary, the overall optimization process of the multimodal texture correction model is illustrated as Algorithm 2. In this process,

H_{A}^{(k + 1)}

is optimized using Algorithm 1. The iteration stops when the relative change (

R e l C h a

) in

T_{C}

between two consecutive iterations is less than a tolerance deviation

ε

. The final

T_{C}

is obtained accordingly. The formula for the relative change determination is as follows:

R e l C h a = \frac{{‖T_{C}^{(k + 1)} - T_{C}^{(k)}‖}_{F}}{{‖T_{C}^{(k)}‖}_{F}} < ε

(34)

As the iterations progress,

R e l C h a

gradually decreases. Therefore, it is necessary to determine the parameter

ε

such that it is slightly larger than

R e l C h a

to balance the efficiency and accuracy of the model. For instance, Figure 2 shows the convergence result from the test image in the WorldView-3 dataset. When the number of iterations reached around 15,

R e l C h a

tended to converge and approach

10^{- 4}

. Therefore,

ε

could be assigned a value of

10^{- 4}

.

Algorithm 2. Optimization algorithm of the multimodal texture correction model.

Input: PAN image

P

, I component of LRMS image I_{0}

.

Initialize : Set T_{C}^{(0)} = P

, H_{A}^{(0)}

is initialized by Algorithm 1, Λ_{1}^{(0)} = Λ_{2}^{(0)} = Λ_{3}^{(0)} = U

, φ^{(0)} = 1

, τ = 1.01

, k = 0

.

While R e l C h a > ε

do

Optimize A^{(k + 1)}

via (25);

Optimize B^{(k + 1)}

via (27);

Optimize T_{C}^{(k + 1)}

via (29);

Optimize H_{A}^{(k + 1)}

via Algorithm 1;

Optimize C^{(k + 1)}

via (31);

Optimize Λ_{1}^{(k + 1)}

, Λ_{2}^{(k + 1)}

, and Λ_{3}^{(k + 1)}

via (32);

φ^{(k + 1)} = τ φ^{(k)}

, k = k + 1

.
End While

Output : Texture-corrected image T_{C}

.

3.3. Adaptive Edge Detail Fusion Model

3.3.1. Adaptive Extraction of T_C Image Detail and Applying Edge Protection

After obtaining

T_{C}

using Algorithm 2, the following formula was selected to extract the details

D_{T_{C}}

:

D_{T_{C}} = T_{C} - T_{C L}

(35)

where

T_{C L}

denotes the low-resolution form of

T_{C}

. To extract the details more accurately from

T_{C}

, from Equations (2) and (5), it is known that

T_{C L}

can be obtained through two methods. The first method involves obtaining a degraded image

T_{C D}

of

T_{C}

using Algorithm 1, which is akin to the MRA-based methods for detail extraction, which better preserve the spectral information. The second method involves obtaining

I_{U P}

using Equation (4), which is akin to the CS-based methods for detail extraction, which better preserve the spatial information. Therefore, considering the advantages of these two methods, an adaptive extraction of

D_{T_{C}}

was designed as follows:

T_{C L} = χ_{1} I_{U P} + (1 - χ_{1}) T_{C D} s . t . 0 < χ_{1} < 1

(36)

where

χ_{1}

represents the weight coefficient to be determined. Due to the influence of the correlation and similarity between the source images on the accuracy of the detail extraction using both methods, the influence coefficient for

I_{U P}

can be set as

x_{1}

and for

T_{C D}

as

x_{2}

. The formulas are as follows:

\begin{array}{l} x_{1} = \frac{ρ (T_{C}, I_{U P}) + S (T_{C}, I_{U P})}{2} \\ x_{2} = \frac{ρ (T_{C}, T_{C D}) + S (T_{C}, T_{C D})}{2} \end{array}

(37)

Since

x_{1}

and

x_{2}

do not satisfy the normalization constraint of

χ_{1}

,

χ_{1}

should be positively correlated with

x_{1}

and

x_{2}

within a reasonable range. Therefore,

χ_{1}

can be obtained using the following equation:

χ_{1} = \sqrt{1 - e^{- x_{3}}} s . t . x_{3} = \frac{x_{1}}{x_{1} + x_{2}}

(38)

After substituting

χ_{1}

from the above equation into Equation (36) to obtain

T_{C L}

, and then substituting

T_{C L}

into Equation (35),

D_{T_{C}}

is finally obtained, which completes the operation of the adaptive detail extraction from the

T_{C}

image. To simultaneously preserve the edge information during the detail extraction process, the following edge detection matrix formula

E_{T_{C}}

is utilized to extract edges [34]:

E_{T_{C}} = e^{- \frac{η}{{|\nabla T|}^{4} + ζ}}

(39)

where

η

and

ζ

are modulation coefficients, and

\nabla

denotes the gradient operator. Generally,

η

is set to

10^{- 9}

and

ζ

to

10^{- 10}

. Therefore, the detail information

F_{1}

of the

T_{C}

image with edge protection applied is as follows:

F_{1} = D_{T_{C}} E_{T_{C}}

(40)

3.3.2. Extracting Detail from UPMS Image and Applying Edge Protection

The following formula is selected to extract the details of the UPMS image

D_{M}

:

D_{M}^{i} = M_{U P}^{i} - M_{U P L}^{i}

(41)

where

M_{U P L}

represents the low-resolution version of the UPMS image. Since

M_{U P L}

is unknown, the MTF [35,36] obtained from the MS sensor is introduced as a crucial indicator for extracting details from the UPMS image. Therefore, an MTF-matched Gaussian filter

H_{M G}

is applied to degrade the UPMS image, which results in its low-resolution version. The specific process is shown in the following equation:

M_{U P L}^{i} = H_{M G} M_{U P}^{i}

(42)

Substituting the above equation into Equation (41) yields the detail information of the UPMS image. At this point, it is necessary to apply edge protection to

D_{M}

using the edge detection matrix formula

E_{M}

[34]:

E_{M}^{i} = e^{- \frac{η}{{|\nabla M_{U P}^{b}|}^{4} + ζ}}

(43)

Therefore, the detail information

F_{2}

of the UPMS image with edge protection applied is as follows:

F_{2}^{i} = D_{M}^{i} E_{M}^{i}

(44)

3.3.3. Adaptive Edge Detail Fusion Process

After extracting the edge-protected detail information from the

T_{C}

and UPMS images,

F_{1}

and

F_{2}

can be fused. However, since the spatial resolution of the UPMS image is lower than that of

T_{C}

,

F_{2}

contains less detail information than

F_{1}

. Directly fusing them may result in a loss of detail information. To avoid this situation, the information in

F_{2}

is enhanced to match the level of

F_{1}

before the fusion. The specific formula is as follows:

ξ_{i} = \underset{ξ_{i}}{\arg \min} \frac{1}{2} {‖F_{1} - ξ_{i} F_{2}^{i}‖}_{F}^{2}

(45)

where

ξ

is the scaling factor, which is determined using a linear regression model [37]. Therefore, the spatial information enhanced by

ξ

, which is denoted as

F_{3}

, is expressed as follows:

F_{3}^{i} = ξ_{i} F_{2}^{i}

(46)

At this point,

F_{1}

and

F_{3}

can be adaptively fused to obtain the detail information

F

. The specific algorithm is as follows:

F_{i} = χ_{2} F_{1} + (1 - χ_{2}) F_{3}^{i}

(47)

where

χ_{2}

is the weight coefficient. The allocation of weight to the detail information is influenced by the correlation and similarity between the

T_{C}

and UPMS images. Therefore, Equation (37) can be used to establish the relationship between

T_{C}

and

I_{U P}

as

x_{1}

, while ensuring that

χ_{2}

remains within a reasonable range and is positively correlated with

x_{1}

. The specific formula is as follows:

χ_{2} = \sqrt{1 - e^{- x_{1}}}

(48)

Substituting

χ_{2}

from the above equation into Equation (47) yields the final

F

.

3.3.4. Final Injection of Spatial Edge Detail Information

By substituting

F

from Equation (47) into the injection model below, the final HRMS image is obtained:

M_{H R}^{i} = M_{U P}^{i} + g_{i} \frac{M_{U P}^{i}}{\frac{1}{B} \sum_{i = 1}^{B} M_{U P}^{i}} F_{i}

(49)

where

g

represents the scaling factor for the injected details, which can be adaptively determined by the following formula:

g_{i} = \frac{σ^{2} (T_{C}) + cov (T_{C}, M_{U P}^{i})}{σ^{2} (T_{C})}

(50)

where

cov (\cdot)

represents the covariance function, and

σ^{2}

represents the variance function.

4. Experiments and Results

4.1. Experimental Design

In Section 4, to demonstrate the performance advantages and effectiveness of the proposed algorithm, the proposed method is compared with ten methods: GSA [38], NIHS [39], BDSD-PC [40], FusionNet [41], ATWT-W3 [42], BT-H [43], SR-D [44], DMPIF [22], CDIF [25], and A-PNN [19]. Extensive experiments were conducted using four datasets: GaoFen-2, QuickBird, WorldView-2, and WorldView-3 [45]. Each image pair in the datasets included one MS image and one PAN image. In the GaoFen-2 and QuickBird datasets, the MS images had four bands, whereas in the WorldView-2 and WorldView-3 datasets, the MS images had eight bands. All datasets contained PAN images with only one band.

To better evaluate the performance of the proposed algorithm, two comparison experiments were conducted. The first experiment was a simulation experiment, i.e., a reduced-scale (RS) experiment. According to the Wald protocol, the original MS image was used as a reference image, also known as the ground truth (GT) image [46]. In this experiment, the original MS and PAN images were downsampled by a factor of four. The downsampled images served as the source images for the RS experiment. The algorithm proposed in this paper was used to fuse these source images, and the fused image was compared with the GT image. A smaller difference indicates better performance. Therefore, in this experiment, each band of the GT image was cropped to

256 \times 256

pixels, which resulted in each band of the MS image being cropped to

64 \times 64

pixels, and the PAN image was cropped to

256 \times 256

pixels.

The second experiment was a real experiment, i.e., a full-scale (FS) experiment. After successfully implementing the RS experiment, the FS experiment could be conducted, where the source images were directly fused. Since there was no GT image available as a reference, the polynomial kernel upsampling (EXP) [47] method with twenty-three coefficients was used as the spectral benchmark. Additionally, each band of the original MS image was cropped to

128 \times 128

pixels, and the PAN image was cropped to

512 \times 512

pixels. As a result, each band of the fused image was also

512 \times 512

pixels. Detailed information about these four datasets is summarized in Table 1.

To evaluate and compare the image quality of different methods, combined objective and subjective evaluation criteria were adopted. In the RS experiment, nine commonly used objective evaluation metrics were employed: the Q2n index (Q4 for four-band datasets, Q8 for eight-band datasets) [48] to assess the spatial and spectral qualities, the peak signal-to-noise ratio (PSNR) [49] to measure the error between the reconstructed and reference images, the universal image quality index (UIQI) [50] to comprehensively evaluate the quality differences and similarities after the fusion, the relative average spectral error (RASE) [51] to evaluate the average spectral differences before and after the fusion, the root-mean-square error (RMSE) to evaluate the overall difference between the fused image and the reference image, the error relative global dimensionless synthesis (ERGAS) [52] to indicate the distortion levels in spatial and spectral information, the spectral correlation coefficient (SCC) [53] to measure the preservation of the spectral information in the images, the correlation coefficient (CC) [32] to indicate the degree of correlation between the fused image and the reference image, and the structural similarity index measure (SSIM) [33] to evaluate the similarity between the fused image and the reference image. The subjective evaluation visualized the fused MS image by extracting the red (R), green (G), and blue (B) bands to display true-color fused images, which provided a more intuitive reflection of the quality differences in the image.

In the FS experiment, three additional objective evaluation metrics were used:

D_{λ}

[54] for the spectral distortion during the fusion,

D_{S}

[54] for the spatial distortion during the fusion, and the quality without reference (QNR) [54] to assess the quality of the fused images. In the above evaluation metrics, the ideal values are as follows: 1 for Q2n, UIQI, SCC, CC, SSIM, and QNR; 0 for RASE, RMSE, ERGAS,

D_{λ}

, and

D_{S}

; and infinity for PSNR. The datasets used for the RS and FS experiments discussed in Section 4.2 and Section 4.3 are illustrated in Table 1. Each experiment included subjective and objective evaluations of a pair of images from their respective datasets. All experiments discussed in Section 4 were conducted on a PC equipped with an Intel Core i7-12700 CPU running at a base speed of 2.10 GHz with 32 GB of memory. The experimental platform used was MATLAB R2021b.

4.2. Reduced-Scale Experiments

4.2.1. QuickBird Dataset

For the RS experiment, Figure 3 shows the subjective evaluation fusion results of the proposed method and various compared methods on the QuickBird dataset, where the GT image served as the reference. To clarify the spatial and spectral information of the images, the local fusion results were magnified. From the enlarged red rectangles in the local area, it can be observed that the GSA and BT-H methods show excessive details in the roofs of the houses. The images produced by the NIHS, SR-D, BDSD-PC, and ATWT-M3 methods were relatively blurry and darker. The images produced by the FusionNet method suffered from issues such as excessive brightness and noticeable deviations in the edge information. The CDIF method maintained good edge information but introduced artifacts. Despite the fact the DMPIF and A-PNN methods preserved the spatial information relatively well, they exhibited excessive color information at the edges, which resulted in relatively severe spectral distortion. In contrast, the results of the proposed method in this paper closely approximated the GT image, which effectively preserved both the spatial and spectral information. The objective evaluation fusion results of Figure 3 are shown in Table 2, where the values inside parentheses indicate the ideal outcomes, and the metrics highlighted in bold black text indicate the optimal results. It can be observed that compared with the other ten methods, the proposed method achieved the superior results across all evaluation metrics and required relatively less time.

4.2.2. WorldView-2 Dataset

Figure 4 presents the subjective evaluation fusion results of the WorldView-2 dataset from the various compared methods. From the enlarged red rectangles, it can be observed that the GSA, BDSD-PC, ATWT-M3, and SR-D methods exhibited issues such as poor image clarity, severe spatial distortion, and darker colors. The NIHS method showed issues, with the excessive injection of spatial information in certain areas. In the FusionNet method, there was an issue with inaccurately preserving spatial detail information. The CDIF method showed significant spatial and spectral distortion issues. The DMPIF method lacked clarity compared with the GT image and suffered from severe artifact problems. In the A-PNN and BT-H methods, spectral distortions were present in some areas, along with the poor retention of spatial information. The proposed method in this paper aligned most closely with the GT image and visually outperformed other compared methods. Table 3 displays the objective evaluation fusion results from Figure 4. Compared with the other ten methods, the proposed method achieved superior results across all evaluation metrics and operated in a relatively shorter time.

4.2.3. WorldView-3 Dataset

Figure 5 illustrates the subjective evaluation fusion results of the WorldView-3 dataset across various methods. From the enlarged red rectangles, it is evident that the GSA method exhibited darker colors on the rooftops compared with the GT image. The NIHS method showed some artifacts, which impacted the spatial information quality of the image. The images produced by the BDSD-PC, ATWT-M3, and SR-D methods appeared blurry. The images from the BT-H method had excessively high brightness and exhibited significant spectral distortion. The FusionNet method introduced excessive detail into the images, which resulted in some color alterations. Although the CDIF method preserved the spectral information well, it lacked details and suffered from significant spatial distortion. The DMPIF and A-PNN methods exhibited some color changes, which led to severe spectral distortion. The proposed method in this paper aligned closest with the GT image, as seen by the superior subjective visual results achieved. Table 4 presents the objective evaluation fusion results from Figure 5. It is evident that our method outperformed others across all evaluation metrics, with shorter processing times.

4.3. Full-Scale Experiments

4.3.1. GaoFen-2 Dataset

For the FS experiment, Figure 6 presents the subjective evaluation fusion results of the proposed method compared with the other methods, where the EXP image was used as the spectral reference. From the enlarged red rectangles, it is evident that the GSA, FusionNet, BT-H, and A-PNN methods exhibited significant color changes compared with the EXP image, which resulted in severe spectral distortion. The images produced by the DMPIF method exhibited significant problems, with artifacts and noticeable color changes. The NIHS, BDSD, ATWT-M3, and SR-D methods produced blurry images with substantial spatial distortion. The CDIF method introduced extraneous color artifacts and showed blurriness in some regions. The proposed method in this paper enhanced the spatial resolution of the UPMS image while maintaining spectral information close to the EXP image, and thus, yielded superior visual results compared with the other methods. The objective evaluation fusion results of Figure 6 are shown in Table 5. The results indicate that our method outperformed the others in the

D_{s}

and QNR metrics and was slightly inferior in the

D_{λ}

metric, where it achieved the highest spatial resolution with minimal spectral loss.

4.3.2. WorldView-2 Dataset

Figure 7 presents the subjective evaluation fusion results of the WorldView-2 dataset from the various compared methods. From the enlarged red rectangle, it can be seen that the GSA and FusionNet methods exhibited more noticeable color changes compared with the EXP images, which led to more pronounced spectral distortions. The images processed by the NIHS, BDSD-PC, and ATWT-M3 methods were significantly blurry, with pronounced spatial distortions. The BT-H method resulted in images with darker colors, which led to some spectral distortion. The SR-D method exhibited poor edge preservation and significant deviations in spatial detail information injection. The DMPIF and A-PNN methods exhibited unnecessary color markers and produced artifacts. The CDIF method exhibited deviations in edge preservation between the red and green in the images, and the colors changed. Our proposed method, while maintaining spectral proximity to EXP, preserved accurate spatial information and visually outperformed the other compared methods. The objective evaluation fusion results of Figure 7 are shown in Table 6. The results indicate that our method outperformed the others in terms of the

D_{s}

and QNR metrics, while slightly trailing the NIHS method in the

D_{λ}

metric. This achievement ensured the highest spatial resolution of the images with minimal spectral loss. In summary, across the RS and FS experiments on the four datasets, the proposed method in this paper consistently outperformed other compared methods. It effectively balanced the spectral and spatial information to achieve a superior image quality in less time.

4.4. Parameters Analysis

From Algorithm 2, it is evident that certain unknown parameters still required determination. To enhance the stability of the proposed model in this paper, a grid search method was employed to adaptively determine these parameters. Parameters

μ_{1}

,

μ_{2}

, and

μ_{3}

play similar roles in Equation (23). Therefore, to reduce the parameter complexity, we set

μ = μ_{1} = μ_{2} = μ_{3}

. At this point, the six parameters that needed to be determined are

α

,

β

,

γ

,

δ

,

θ

, and

μ

. Since parameters

α

and

β

describe the relationship between the source image and the

T_{C}

image, we first combined them to determine these two parameters. Next, parameters

γ

and

δ

describe the spectral and spatial fidelities of the A-PNN-based deep plug-and-play term, and thus, we combined them to determine the parameters. Finally, we determined the remaining two parameters:

θ

and

μ

.

In the RS experiments, the Q2n metric was used to evaluate the spatial and spectral qualities of the fused images. In the FS experiment, the QNR metric was used for the same purpose. The results are shown in Figure 8. First, with the other parameters fixed,

α

and

β

were searched. As seen in Figure 8a–e, the optimal values for the QuickBird dataset were 24.7 and

4 \times 10^{- 4}

; for the WorldView-2 dataset in the RS experiment, they were 18.4 and

2 \times 10^{- 2}

; for the WorldView-3 dataset, they were 15 and

2 \times 10^{- 3}

; for the GaoFen-2 dataset, they were 33.1 and

1 \times 10^{- 7}

; and for the WorldView-2 dataset in the FS experiment, they were 9.2 and

8 \times 10^{- 4}

, respectively. Similarly, searching for

γ

and

δ

, Figure 8f–j show that the optimal values for the QuickBird dataset were

9 \times 10^{- 1}

and

7 \times 10^{- 3}

; for the WorldView-2 dataset in the RS experiment, they were 1.6 and

6 \times 10^{- 4}

; for the WorldView-3 dataset, they were

9 \times 10^{- 1}

and 6; for the GaoFen-2 dataset, they were

3 \times 10^{- 2}

and

1 \times 10^{- 6}

; and for the WorldView-2 dataset in the FS experiment, they were

5 \times 10^{- 1}

and 1, respectively. Finally, searching for the remaining two parameters,

θ

and

μ

, Figure 8k–o indicate that the optimal values for the QuickBird dataset were

3.8 \times 10^{2}

and

6 \times 10^{- 1}

; for the WorldView-2 dataset in the RS experiment, they were

1.08 \times 10^{2}

and 2; for the WorldView-3 dataset, they were 23.5 and 2.7; for the GaoFen-2 dataset, they were

1.6 \times 10^{2}

and 9.8; and for the WorldView-2 dataset in the FS experiment, they were 12 and 2.4, respectively. In summary, after determining these optimal parameter values, the algorithm in this study achieved its superior fusion performance.

4.5. Ablation Study

Using a pair of images from the WorldView-3 dataset in Section 4.2.3, an ablation study experiment was conducted to validate the effectiveness of the algorithm proposed in this paper. The algorithm consisted of the multimodal texture correction model (MTC) and the adaptive edge detail fusion model (AEDF). These models were divided into five ablation models, as detailed in Table 7. For Model 1, which lacked both MTC and AEDF, the fusion images were generated using the following injection model:

M_{H R}^{i} = M_{U P}^{i} + g_{i} \frac{M_{U P}^{i}}{\frac{1}{B} \sum_{i = 1}^{B} M_{U P}^{i}} (P - I_{U P})

(51)

In Models 2 to 4, Equation (17) was assigned values according to the parameters in Table 7, which resulted in

T_{C}

replacing

P

in the above equation. The injection model used was as follows:

M_{H R}^{i} = M_{U P}^{i} + g_{i} \frac{M_{U P}^{i}}{\frac{1}{B} \sum_{i = 1}^{B} M_{U P}^{i}} (T_{C} - I_{U P})

(52)

Model 5 corresponded to the algorithm proposed in this paper. Table 7 presents the objective evaluation fusion results from Models 1 to 5, indicating improved performance with the inclusion of different ablation models. Figure 9 illustrates the subjective evaluation fusion results of Models 1 to 5. From the figure, it is evident that both subjective fusion performances of Models 1 to 5 progressively improved and approached closer to the GT image. This further validated the effectiveness of the algorithm proposed in this study.

5. Conclusions

Due to the low correlation and similarity between MS and PAN images acquired from different sensors, direct fusion can lead to significant spectral and spatial distortions. Moreover, achieving an ideal HRMS image requires accurately injecting spatial information from the PAN image into the UPMS image. However, inaccurate spatial information injection can degrade the spatial resolution of the HRMS image. To address these issues, this paper proposes a method based on multimodal texture correction and adaptive edge detail fusion models. The primary objective was to obtain a

T_{C}

image that inherits precise spatial detail information from the PAN image while maintaining high correlation and similarity with the MS image. Several constraints were established for this purpose: intensity constraint between

T_{C}

and

I_{0}

; gradient constraint between

T_{C}

, PAN, and

I_{0}

; and an A-PNN-based deep plug-and-play constraint between

T_{C}

and

I_{n e t}

. An adaptive degradation filter algorithm is proposed to accurately maintain these constraints. Ultimately, a multimodal texture correction model was constructed. The ADMM algorithm is employed to solve this problem and generate

T_{C}

, which can effectively replace the functionality of the PAN image. Since spatial detail information is not solely present in

T_{C}

but also exists in LRMS image, an adaptive edge detail fusion model is proposed. This model extracts detail information from both the

T_{C}

and UPMS images while applying edge protection. To extract detail information more accurately, an adaptive algorithm is used to extract details from

T_{C}

, and MTF-matched Gaussian filters are used to extract details from the UPMS image. The edge-protected details from

T_{C}

are adaptively fused with the enhanced edge-protected details from the UPMS image. Finally, the fused spatial details are injected into the UPMS image to generate the final HRMS image. Extensive comparative experiments in RS and FS validated the performance advantages of the proposed algorithm. A parameter analysis and ablation study further confirmed its effectiveness by demonstrating superior fusion results.

In the multimodal texture correction model, iterative optimization conducted on two-dimensional images significantly improved the solving efficiency. The three correction prior terms effectively preserved the spatial and spectral information. However, this model still has some drawbacks, as the correction prior terms include unknown parameters that need to be determined through experiments, which potentially consume substantial computational resources and time. In the adaptive edge detail fusion model, to obtain accurate spatial information, both the edge detail information from the

T_{C}

and UPMS images are comprehensively considered. However, issues such as the mismatch between the spatial information injected into the UPMS images and their spectral information still persist. Therefore, our future work will focus on adaptively determining other unknown parameters in the pansharpening model and exploring more suitable injection model methods to enhance the overall performance and efficiency.

Author Contributions

Conceptualization, E.W.; methodology, E.W.; validation, J.W. and L.D.; formal analysis, E.W.; investigation, J.W. and L.D.; writing—original draft preparation, E.W.; writing—review and editing, D.L. and J.A.B.; visualization, E.W.; supervision, D.L. and J.A.B.; project administration, D.L.; funding acquisition, L.W. All authors read and agreed to the published version of this manuscript.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities under nos. 04442024040 and 04442024041. This work was supported in part by the National Natural Science Foundation of China under grant 62071084.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors sincerely thank the academic editors and reviewers for their useful comments and constructive suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, K.; Zhang, F.; Wan, W.; Yu, H.; Sun, J.; Del Ser, J.; Elyan, E.; Hussain, A. Panchromatic and multispectral image fusion for remote sensing and earth observation: Concepts, taxonomy, literature review, evaluation methodologies and challenges ahead. Inf. Fusion 2023, 93, 227–242. [Google Scholar] [CrossRef]
Meng, X.; Shen, H.; Li, H.; Zhang, L.; Fu, R. Review of the pansharpening methods for remote sensing images based on the idea of meta-analysis: Practical discussion and challenges. Inf. Fusion 2019, 46, 102–113. [Google Scholar] [CrossRef]
Vivone, G.; Alparone, L.; Chanussot, J.; Dalla Mura, M.; Garzelli, A.; Licciardi, G.A.; Restaino, R.; Wald, L. A critical comparison among pansharpening algorithms. IEEE Trans. Geosci. Remote Sens. 2014, 53, 2565–2586. [Google Scholar] [CrossRef]
Choi, Y. A Novel Multimodal Image Fusion Method Using Hybrid Wavelet-Based Contourlet Transform. Ph.D. Thesis, University of Nevada, Reno, NV, USA, 2014. [Google Scholar]
Vivone, G.; Dalla Mura, M.; Garzelli, A.; Restaino, R.; Scarpa, G.; Ulfarsson, M.O.; Alparone, L.; Chanussot, J. A new benchmark based on recent advances in multispectral pansharpening: Revisiting pansharpening with classical and emerging pansharpening methods. IEEE Geosci. Remote Sens. Mag. 2020, 9, 53–81. [Google Scholar] [CrossRef]
El-Mezouar, M.C.; Taleb, N.; Kpalma, K.; Ronsin, J. An IHS-based fusion for color distortion reduction and vegetation enhancement in IKONOS imagery. IEEE Trans. Geosci. Remote Sens. 2010, 49, 1590–1602. [Google Scholar] [CrossRef]
Shahdoosti, H.R.; Ghassemian, H. Combining the spectral PCA and spatial PCA fusion methods by an optimal filter. Inf. Fusion 2016, 27, 150–160. [Google Scholar] [CrossRef]
Choi, J.; Yu, K.; Kim, Y. A new adaptive component-substitution-based satellite image fusion by using partial replacement. IEEE Trans. Geosci. Remote Sens. 2010, 49, 295–309. [Google Scholar] [CrossRef]
Garzelli, A.; Nencini, F.; Capobianco, L. Optimal MMSE pan sharpening of very high resolution multispectral images. IEEE Trans. Geosci. Remote Sens. 2007, 46, 228–236. [Google Scholar] [CrossRef]
Vivone, G.; Restaino, R.; Chanussot, J. Full scale regression-based injection coefficients for panchromatic sharpening. IEEE Trans. Image Process. 2018, 27, 3418–3431. [Google Scholar] [CrossRef]
Cheng, J.; Liu, H.; Liu, T.; Wang, F.; Li, H. Remote sensing image fusion via wavelet transform and sparse representation. ISPRS J. Photogramm. Remote Sens. 2015, 104, 158–173. [Google Scholar] [CrossRef]
Chun-Man, Y.; Bao-Long, G.; Meng, Y. Fast algorithm for nonsubsampled contourlet transform. Acta Autom. Sin. 2014, 40, 757–762. [Google Scholar]
Moonon, A.-U.; Hu, J.; Li, S. Remote sensing image fusion method based on nonsubsampled shearlet transform and sparse representation. Sens. Imaging 2015, 16, 23. [Google Scholar] [CrossRef]
Fang, F.; Li, F.; Shen, C.; Zhang, G. A variational approach for pan-sharpening. IEEE Trans. Image Process. 2013, 22, 2822–2834. [Google Scholar] [CrossRef]
Palsson, F.; Sveinsson, J.R.; Ulfarsson, M.O. A new pansharpening algorithm based on total variation. IEEE Geosci. Remote Sens. Lett. 2013, 11, 318–322. [Google Scholar] [CrossRef]
Wang, T.; Fang, F.; Li, F.; Zhang, G. High-quality Bayesian pansharpening. IEEE Trans. Image Process. 2018, 28, 227–239. [Google Scholar] [CrossRef]
Ghahremani, M.; Liu, Y.; Yuen, P.; Behera, A. Remote sensing image fusion via compressive sensing. ISPRS J. Photogramm. Remote Sens. 2019, 152, 34–48. [Google Scholar] [CrossRef]
Masi, G.; Cozzolino, D.; Verdoliva, L.; Scarpa, G. Pansharpening by convolutional neural networks. Remote Sens. 2016, 8, 594. [Google Scholar] [CrossRef]
Scarpa, G.; Vitale, S.; Cozzolino, D. Target-adaptive CNN-based pansharpening. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5443–5457. [Google Scholar] [CrossRef]
Wang, Z.; Ma, Y.; Zhang, Y. Review of pixel-level remote sensing image fusion based on deep learning. Inf. Fusion 2023, 90, 36–58. [Google Scholar] [CrossRef]
Javan, F.D.; Samadzadegan, F.; Mehravar, S.; Toosi, A.; Khatami, R.; Stein, A. A review of image fusion techniques for pan-sharpening of high-resolution satellite imagery. ISPRS J. Photogramm. Remote Sens. 2021, 171, 101–117. [Google Scholar] [CrossRef]
Wu, Z.-C.; Huang, T.-Z.; Deng, L.-J.; Vivone, G.; Miao, J.-Q.; Hu, J.-F.; Zhao, X.-L. A new variational approach based on proximal deep injection and gradient intensity similarity for spatio-spectral image fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6277–6290. [Google Scholar] [CrossRef]
Liu, P.; Liu, J.; Xiao, L. A unified pansharpening method with structure tensor driven spatial consistency and deep plug-and-play priors. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5413314. [Google Scholar] [CrossRef]
Lu, H.; Yang, Y.; Huang, S.; Chen, X.; Su, H.; Tu, W. Intensity mixture and band-adaptive detail fusion for pansharpening. Pattern Recognit. 2023, 139, 109434. [Google Scholar] [CrossRef]
Xiao, J.L.; Huang, T.Z.; Deng, L.J.; Wu, Z.C.; Vivone, G. A New Context-Aware Details Injection Fidelity with Adaptive Coefficients Estimation for Variational Pansharpening. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
Ayas, S.; Gormus, E.T.; Ekinci, M. An efficient pan sharpening via texture based dictionary learning and sparse representation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2448–2460. [Google Scholar] [CrossRef]
Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
Wu, C.; Tai, X.-C. Augmented Lagrangian method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models. SIAM J. Imaging Sci. 2010, 3, 300–339. [Google Scholar] [CrossRef]
Kim, D.; Fessler, J.A. Another look at the fast iterative shrinkage/thresholding algorithm (FISTA). SIAM J. Optim. 2018, 28, 223–250. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 2011, 3, 1–122. [Google Scholar]
Vivone, G.; Addesso, P.; Restaino, R.; Dalla Mura, M.; Chanussot, J. Pansharpening based on deconvolution for multiband filter estimation. IEEE Trans. Geosci. Remote Sens. 2018, 57, 540–553. [Google Scholar] [CrossRef]
Zhang, K.; Zhang, F.; Feng, Z.; Sun, J.; Wu, Q. Fusion of panchromatic and multispectral images using multiscale convolution sparse decomposition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 426–439. [Google Scholar] [CrossRef]
Dosselmann, R.; Yang, X.D. A comprehensive assessment of the structural similarity index. Signal Image Video Process. 2011, 5, 81–91. [Google Scholar] [CrossRef]
Leung, Y.; Liu, J.; Zhang, J. An improved adaptive intensity–hue–saturation method for the fusion of remote sensing images. IEEE Geosci. Remote Sens. Lett. 2013, 11, 985–989. [Google Scholar] [CrossRef]
Lee, J.; Lee, C. Fast and efficient panchromatic sharpening. IEEE Trans. Geosci. Remote Sens. 2009, 48, 155–163. [Google Scholar]
Aiazzi, B.; Alparone, L.; Baronti, S.; Garzelli, A.; Selva, M. MTF-tailored multiscale fusion of high-resolution MS and Pan imagery. Photogramm. Eng. Remote Sens. 2006, 72, 591–596. [Google Scholar] [CrossRef]
Yao, W.; Li, L. A new regression model: Modal linear regression. Scand. J. Stat. 2014, 41, 656–671. [Google Scholar] [CrossRef]
Aiazzi, B.; Baronti, S.; Selva, M. Improving component substitution pansharpening through multivariate regression of MS $+ $ Pan data. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3230–3239. [Google Scholar] [CrossRef]
Ghahremani, M.; Ghassemian, H. Nonlinear IHS: A promising method for pan-sharpening. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1606–1610. [Google Scholar] [CrossRef]
Vivone, G. Robust band-dependent spatial-detail approaches for panchromatic sharpening. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6421–6433. [Google Scholar] [CrossRef]
Deng, L.-J.; Vivone, G.; Jin, C.; Chanussot, J. Detail injection-based deep convolutional neural networks for pansharpening. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6995–7010. [Google Scholar] [CrossRef]
Ranchin, T.; Wald, L. Fusion of high spatial and spectral resolution images: The ARSIS concept and its implementation. Photogramm. Eng. Remote Sens. 2000, 66, 49–61. [Google Scholar]
Lolli, S.; Alparone, L.; Garzelli, A.; Vivone, G. Haze correction for contrast-based multispectral pansharpening. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2255–2259. [Google Scholar] [CrossRef]
Vicinanza, M.R.; Restaino, R.; Vivone, G.; Dalla Mura, M.; Chanussot, J. A pansharpening method based on the sparse representation of injected details. IEEE Geosci. Remote Sens. Lett. 2015, 12, 180–184. [Google Scholar] [CrossRef]
Deng, L.-J.; Vivone, G.; Paoletti, M.E.; Scarpa, G.; He, J.; Zhang, Y.; Chanussot, J.; Plaza, A. Machine learning in pansharpening: A benchmark, from shallow to deep networks. IEEE Geosci. Remote Sens. Mag. 2022, 10, 279–315. [Google Scholar] [CrossRef]
Wald, L.; Ranchin, T.; Mangolini, M. Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images. Photogramm. Eng. Remote Sens. 1997, 63, 691–699. [Google Scholar]
Aiazzi, B.; Baronti, S.; Selva, M.; Alparone, L. Bi-cubic interpolation for shift-free pan-sharpening. ISPRS J. Photogramm. Remote Sens. 2013, 86, 65–76. [Google Scholar] [CrossRef]
Garzelli, A.; Nencini, F. Hypercomplex quality assessment of multi/hyperspectral images. IEEE Geosci. Remote Sens. Lett. 2009, 6, 662–665. [Google Scholar] [CrossRef]
Horé, A.; Ziou, D. Is there a relationship between peak-signal-to-noise ratio and structural similarity index measure? IET Image Process. 2013, 7, 12–24. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
Choi, M. A new intensity-hue-saturation fusion approach to image fusion with a tradeoff parameter. IEEE Trans. Geosci. Remote Sens. 2006, 44, 1672–1682. [Google Scholar] [CrossRef]
Renza, D.; Martinez, E.; Arquero, A. A new approach to change detection in multispectral images by means of ERGAS index. IEEE Geosci. Remote Sens. Lett. 2012, 10, 76–80. [Google Scholar] [CrossRef]
Khalaf, A.F.; Owis, M.I.; Yassine, I.A.J.E.S.w.A. A novel technique for cardiac arrhythmia classification using spectral correlation and support vector machines. Expert Syst. Appl. 2015, 42, 8361–8368. [Google Scholar] [CrossRef]
Alparone, L.; Aiazzi, B.; Baronti, S.; Garzelli, A.; Nencini, F.; Selva, M. Multispectral and panchromatic data fusion assessment without reference. Photogramm. Eng. Remote Sens. 2008, 74, 193–200. [Google Scholar] [CrossRef]

Figure 1. The proposed model framework diagram.

Figure 2. Iterative convergence results from the WorldView-3 dataset.

Figure 3. Subjective evaluation fusion results of the RS images in the QuickBird dataset.

Figure 4. Subjective evaluation fusion results of the RS images in the WorldView-2 dataset.

Figure 5. Subjective evaluation fusion results of the RS images in the WorldView-3 dataset.

Figure 6. Subjective evaluation fusion results of the FS images in the GaoFen-2 dataset.

Figure 7. Subjective evaluation fusion results of the FS images in the WorldView-2 dataset.

Figure 8. Parameter settings for four different datasets in RS and FS experiments.

Figure 9. Subjective evaluation fusion results of different ablation combination models from the WorldView-3 dataset.

Table 1. Detailed information of the datasets used in this experiment.

Satellite	MS Bands	Experiment Categorization	Sensor	Sizes	Resolution (m)
GaoFen-2	Blue (B), green (G), red (R), and near infrared (NIR)	FS	MS	$128 \times 128 \times 4$	4
GaoFen-2		FS	PAN	$512 \times 512$	1
QuickBird		RS	MS	$64 \times 64 \times 4$	2.44
QuickBird		RS	PAN	$256 \times 256$	0.61
WorldView-2	Coastal blue, B, G, yellow, R, red edge, NIR1, and NIR2	RS/FS	MS	$64 \times 64 \times 8$ $/ 128 \times 128 \times 8$	2
WorldView-2		RS/FS	PAN	$256 \times 256$ $/ 512 \times 512$	0.5
WorldView-3		RS	MS	$64 \times 64 \times 8$	1.24
WorldView-3		RS	PAN	$256 \times 256$	0.31

Table 2. Objective evaluation fusion results of the RS images in the QuickBird dataset.

Methods	Q4 (1)	PSNR (+∞)	UIQI (1)	RASE (0)	RMSE (0)	ERGAS (0)	SCC (1)	CC (1)	SSIM (1)	Time (s)
GSA	0.7204	28.0864	0.8680	45.9107	82.0974	11.9279	0.8384	0.8942	0.8459	0.09
NIHS	0.7359	30.3936	0.8389	37.2876	64.8527	9.2502	0.7884	0.8790	0.8126	0.02
BDSD-PC	0.7787	31.0244	0.8728	34.3589	60.0012	8.8712	0.8241	0.8929	0.8505	0.11
FusionNet	0.7661	30.1892	0.9052	36.9432	65.2170	7.5823	0.8379	0.9020	0.8843	0.47
ATWT-M3	0.7488	30.3354	0.8406	37.5634	65.3632	9.2636	0.8173	0.8747	0.8199	0.12
BT-H	0.7242	28.9158	0.8928	42.6022	75.3440	8.2748	0.8588	0.9072	0.8758	0.03
SR-D	0.7816	31.0340	0.8772	34.2226	59.8619	8.4613	0.8013	0.8872	0.8515	0.69
DMPIF	0.6629	30.2939	0.8904	36.4980	64.5438	9.3122	0.8485	0.8572	0.8544	4.14
CDIF	0.8426	32.0266	0.9133	31.0258	53.8422	7.5931	0.7077	0.9028	0.8920	32.31
A-PNN	0.8315	31.5654	0.9071	32.5016	56.5681	8.0506	0.7775	0.8905	0.8840	0.24
Proposed	0.8579	32.5524	0.9272	28.5341	50.0224	7.0273	0.8595	0.9178	0.9123	0.66

Table 3. Objective evaluation fusion results of the RS images in the WorldView-2 dataset.

Methods	Q8 (1)	PSNR (+∞)	UIQI (1)	RASE (0)	RMSE (0)	ERGAS (0)	SCC (1)	CC (1)	SSIM (1)	Time (s)
GSA	0.8154	24.5476	0.8886	23.8290	126.7365	5.8211	0.9066	0.9140	0.8839	0.04
NIHS	0.8718	26.5331	0.9432	19.2262	101.7211	4.7072	0.8983	0.9167	0.9363	0.01
BDSD-PC	0.8484	25.5758	0.9340	21.0005	112.0279	5.3739	0.8675	0.9095	0.9247	0.10
FusionNet	0.8979	26.8927	0.9555	18.0786	96.3830	4.4973	0.8972	0.9194	0.9489	0.41
ATWT-M3	0.8262	25.1100	0.9234	22.9734	120.8218	5.5593	0.8554	0.8936	0.9104	0.25
BT-H	0.8836	24.8875	0.9597	22.0082	119.0180	4.7140	0.9211	0.9300	0.9543	0.08
SR-D	0.8475	25.4401	0.9347	21.4360	114.1270	5.2281	0.8042	0.8972	0.9220	0.97
DMPIF	0.8910	27.1957	0.9575	17.0660	91.8009	4.2016	0.9054	0.9217	0.9507	4.47
CDIF	0.8407	24.9159	0.9321	22.7995	121.3268	5.5670	0.6384	0.8878	0.9157	32.67
A-PNN	0.9149	27.7784	0.9617	16.2140	86.6690	4.0000	0.9143	0.9262	0.9562	0.19
Proposed	0.9483	29.3102	0.9732	13.2903	71.7632	3.3109	0.9412	0.9389	0.9695	0.67

Table 4. Objective evaluation fusion results of the RS images in the WorldView-3 dataset.

Methods	Q8 (1)	PSNR (+∞)	UIQI (1)	RASE (0)	RMSE (0)	ERGAS (0)	SCC (1)	CC (1)	SSIM (1)	Time (s)
GSA	0.8751	31.5699	0.9319	14.2936	58.2575	3.3419	0.9211	0.9377	0.9261	0.04
NIHS	0.7839	29.8210	0.8978	17.8321	72.3206	4.1553	0.8691	0.9135	0.8865	0.01
BDSD-PC	0.8185	30.3303	0.9203	16.1767	66.4502	3.9888	0.8998	0.9284	0.9119	0.10
FusionNet	0.8897	31.6441	0.9517	13.4228	55.9061	3.2604	0.9047	0.9357	0.9442	0.66
ATWT-M3	0.8025	29.6295	0.8928	18.7549	75.3322	4.3115	0.8640	0.9068	0.8794	0.42
BT-H	0.8032	28.0027	0.9486	20.3187	84.6757	4.2468	0.9219	0.9388	0.9404	0.06
SR-D	0.8178	29.9302	0.9103	17.2519	70.4718	4.0384	0.8434	0.9054	0.8962	1.13
DMPIF	0.8684	31.8053	0.9511	13.2660	55.0638	3.1358	0.9279	0.9357	0.9425	4.64
CDIF	0.8573	30.5537	0.9294	15.9662	65.3548	3.7505	0.7900	0.9130	0.9169	36.70
A-PNN	0.8937	31.0437	0.9386	14.1669	59.1147	3.4508	0.8945	0.9268	0.9291	0.30
Proposed	0.9206	32.8589	0.9579	11.5778	48.2269	2.8134	0.9308	0.9470	0.9529	1.03

Table 5. Objective evaluation fusion results of the FS images in the GaoFen-2 dataset.

Methods	$D_{λ}$ (0)	$D_{s}$ (0)	QNR (1)	Time (s)
GSA	0.2093	0.1456	0.6756	0.12
NIHS	0.0047	0.1132	0.8826	0.03
BDSD-PC	0.0067	0.1128	0.8812	0.15
FusionNet	0.0891	0.0774	0.8403	2.58
ATWT-M3	0.0076	0.1504	0.8431	0.89
BT-H	0.1434	0.1504	0.7278	0.10
SR-D	0.0092	0.1168	0.8751	2.11
DMPIF	0.0693	0.0970	0.8404	17.31
CDIF	0.0227	0.0590	0.9196	120.00
A-PNN	0.1177	0.1226	0.7741	1.07
Proposed	0.0263	0.0516	0.9234	3.29

Table 6. Objective evaluation fusion results of the FS images in the WorldView-2 dataset.

Methods	$D_{λ}$ (0)	$D_{s}$ (0)	QNR (1)	Time (s)
GSA	0.1208	0.1489	0.8511	0.52
NIHS	0.0004	0.0702	0.9298	0.10
BDSD-PC	0.0028	0.0778	0.9222	0.96
FusionNet	0.0224	0.0805	0.9195	1.32
ATWT-M3	0.0072	0.0857	0.9143	1.68
BT-H	0.0598	0.0795	0.8654	0.09
SR-D	0.0593	0.0596	0.8847	4.00
DMPIF	0.0112	0.0889	0.9111	25.40
CDIF	0.0156	0.0767	0.9233	140.74
A-PNN	0.0320	0.1121	0.8879	0.58
Proposed	0.0007	0.0560	0.9433	4.64

Table 7. Objective evaluation fusion results of different ablation combination models from the WorldView-3 dataset.

Models	MTC					AEDF	Q8 (1)	PSNR (+∞)	UIQI (1)	RASE (0)	ERGAS (0)	SCC (1)
Models	$α = 0$	$β = 0$	$γ = 0$	$δ = 0$	$θ = 0$	AEDF	Q8 (1)	PSNR (+∞)	UIQI (1)	RASE (0)	ERGAS (0)	SCC (1)
1	√	√	√	√	√	×	0.8252	28.7842	0.9044	17.8615	4.5200	0.8993
2	×	√	√	√	√	×	0.8424	29.3984	0.9093	16.8916	4.1640	0.8913
3	×	×	√	√	×	×	0.8539	29.4561	0.9169	16.5778	4.1785	0.9018
4	×	×	×	×	×	×	0.8787	30.6656	0.9326	14.5402	3.6459	0.9115
5	×	×	×	×	×	√	0.9206	32.8589	0.9579	11.5778	2.8134	0.9308

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, D.; Wang, E.; Wang, L.; Benediktsson, J.A.; Wang, J.; Deng, L. Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion. Remote Sens. 2024, 16, 2941. https://doi.org/10.3390/rs16162941

AMA Style

Liu D, Wang E, Wang L, Benediktsson JA, Wang J, Deng L. Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion. Remote Sensing. 2024; 16(16):2941. https://doi.org/10.3390/rs16162941

Chicago/Turabian Style

Liu, Danfeng, Enyuan Wang, Liguo Wang, Jón Atli Benediktsson, Jianyu Wang, and Lei Deng. 2024. "Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion" Remote Sensing 16, no. 16: 2941. https://doi.org/10.3390/rs16162941

APA Style

Liu, D., Wang, E., Wang, L., Benediktsson, J. A., Wang, J., & Deng, L. (2024). Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion. Remote Sensing, 16(16), 2941. https://doi.org/10.3390/rs16162941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pansharpening Based on Multimodal Texture Correction and Adaptive Edge Detail Fusion

Abstract

1. Introduction

2. Related Works

2.1. Injection Model

2.2. VO-Based Model

3. Methodology

3.1. The Proposed Model Framework

3.2. Multimodal Texture Correction Model

3.2.1. Intensity Correction Prior

3.2.2. Gradient Correction Prior

3.2.3. A-PNN-Based Deep Plug-and-Play Prior

3.2.4. Proposed Model

3.2.5. Adaptive Degradation Filter Algorithm

3.2.6. Optimization Model Algorithm

3.3. Adaptive Edge Detail Fusion Model

3.3.1. Adaptive Extraction of TC Image Detail and Applying Edge Protection

3.3.2. Extracting Detail from UPMS Image and Applying Edge Protection

3.3.3. Adaptive Edge Detail Fusion Process

3.3.4. Final Injection of Spatial Edge Detail Information

4. Experiments and Results

4.1. Experimental Design

4.2. Reduced-Scale Experiments

4.2.1. QuickBird Dataset

4.2.2. WorldView-2 Dataset

4.2.3. WorldView-3 Dataset

4.3. Full-Scale Experiments

4.3.1. GaoFen-2 Dataset

4.3.2. WorldView-2 Dataset

4.4. Parameters Analysis

4.5. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.3.1. Adaptive Extraction of T_C Image Detail and Applying Edge Protection