Infrared and Visible Image Fusion via Sparse Representation and Guided Filtering in Laplacian Pyramid Domain

Li, Liangliang; Shi, Yan; Lv, Ming; Jia, Zhenhong; Liu, Minqin; Zhao, Xiaobin; Zhang, Xueyu; Ma, Hongbing

doi:10.3390/rs16203804

Open AccessArticle

Infrared and Visible Image Fusion via Sparse Representation and Guided Filtering in Laplacian Pyramid Domain

by

Liangliang Li

¹

,

Yan Shi

^1,*

,

Ming Lv

²,

Zhenhong Jia

²,

Minqin Liu

³

,

Xiaobin Zhao

¹

,

Xueyu Zhang

¹ and

Hongbing Ma

⁴

¹

School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China

²

School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China

³

National Key Laboratory of Space Integrated Information System, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China

⁴

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(20), 3804; https://doi.org/10.3390/rs16203804

Submission received: 16 August 2024 / Revised: 21 September 2024 / Accepted: 27 September 2024 / Published: 13 October 2024

(This article belongs to the Special Issue Machine Learning for Intelligent Processing and Applications of Multi-Source Remote Sensing Data)

Download

Browse Figures

Versions Notes

Abstract

The fusion of infrared and visible images together can fully leverage the respective advantages of each, providing a more comprehensive and richer set of information. This is applicable in various fields such as military surveillance, night navigation, environmental monitoring, etc. In this paper, a novel infrared and visible image fusion method based on sparse representation and guided filtering in Laplacian pyramid (LP) domain is introduced. The source images are decomposed into low- and high-frequency bands by the LP, respectively. Sparse representation has achieved significant effectiveness in image fusion, and it is used to process the low-frequency band; the guided filtering has excellent edge-preserving effects and can effectively maintain the spatial continuity of the high-frequency band. Therefore, guided filtering combined with the weighted sum of eight-neighborhood-based modified Laplacian (WSEML) is used to process high-frequency bands. Finally, the inverse LP transform is used to reconstruct the fused image. We conducted simulation experiments on the publicly available TNO dataset to validate the superiority of our proposed algorithm in fusing infrared and visible images. Our algorithm preserves both the thermal radiation characteristics of the infrared image and the detailed features of the visible image.

Keywords:

infrared and visible image; image fusion; Laplacian pyramid; sparse representation; guided filtering

1. Introduction

Infrared and visible image fusion is a process that integrates the complementary information from infrared (IR) and visible light images to produce a single image that is more informative and suitable for human perception or automated analysis tasks [1]. This technique leverages the distinct advantages of both imaging modalities to enhance the visibility of features that are not apparent in either image alone [2,3].

Unlike visible light images, infrared images capture the thermal radiation emitted by objects. This allows for the detection of living beings, machinery, and other heat sources, even in total darkness or through obstructions like smoke and fog. IR imaging is invaluable for applications requiring visibility in low-light conditions, such as night-time surveillance, search and rescue operations, and wildlife observation [4].

Visible light images provide high-resolution details and color information, which are crucial for human interpretation and understanding of a scene. From photography to video surveillance, visible light imaging is the most common form of imaging, offering a straightforward depiction of the environment as perceived by the human eye. The fusion process integrates the thermal information from infrared images with the detail and color information from visible images [5,6,7,8]. This results in images that highlight both the thermal signatures and the detailed scene information. By combining these two types of images, the fused image enhances the ability to detect and recognize subjects and objects in various conditions, including complete darkness, smoke, fog, and camouflage situations.

Various algorithms and techniques, including multi-resolution analysis, image decomposition, and feature-based methods, have been developed to fuse the images. A major challenge in image fusion is to maintain and highlight the essential details from both source images in the combined image, while avoiding artifacts and ensuring that no crucial information is lost [9,10,11,12,13,14,15,16,17]. For some applications, such as surveillance and automotive safety, the ability to process and fuse images in real time is crucial. This creates difficulties in terms of processing efficiency and the fine-tuning of algorithms.

During the fusion process, some information may be lost or confused, especially in areas with strong contrast or rich details, where the fusion algorithm might not fully retain the information from each image. Additionally, noise or artifacts may be introduced during the fusion process, affecting the quality of the final image. To enhance the performance of the fused image in terms of both thermal radiation characteristics and detail clarity, a fusion method utilizing sparse representation and guided filtering in the Laplacian pyramid domain is constructed. Sparse representation has demonstrated excellent results in image fusion; it is used to process the low-frequency sub-bands, and guided filtering combined with the weighted sum of eight-neighborhood-based modified Laplacian (WSEML) is utilized to process the high-frequency sub-bands. Through experiments and validation on the publicly available TNO dataset, our algorithm has achieved significant fusion effects, incorporating both infrared characteristics and scene details. This is advantageous for subsequent target detection and recognition tasks.

The paper is structured as follows: Section 2 reviews related research. Section 3 introduces the Laplacian pyramid transform. Section 4 details the proposed fusion approach. Section 5 shows the experimental results and discussion. Finally, Section 6 concludes the paper. This structure ensures a clear progression through the background research, foundational concepts, algorithmic details, empirical findings, and concluding remarks, thereby comprehensively addressing the topic of image fusion in the Laplacian pyramid domain.

2. Related Works

2.1. Deep Learning on Image Fusion

Deep learning has achieved significant results in the field of image processing, with popular algorithms including CNNs [18], GANs [19], swin transformer [20,21], vision transformer [22], and mamba [23]. Deep learning has significantly advanced the field of image fusion by introducing models that can learn complex representations and fusion rules from data, leading to superior fusion performance compared with traditional techniques. Deep-learning models can automatically extract and merge the most pertinent features from both infrared and visible images. This process produces fused images that effectively combine the thermal information from infrared images with the detailed texture and color from visible images [24,25,26].

CNNs are widely employed as deep-learning models for image fusion. They excel at capturing spatial hierarchies in images through their deep architecture, making them ideal for tasks that involve spatial data, like images. In the context of image fusion, CNNs can be trained to identify and merge the salient features from both infrared and visible images, ensuring that the fused image retains critical information from both sources [27]. Liu et al. [28] introduced the fusion of infrared and visible images using CNNs. Their experimental findings showcase that this approach attains cutting-edge outcomes in both visual quality and objective metrics. Similarly, Yang et al. [29] devised a method for image fusion leveraging multi-scale convolutional neural networks alongside saliency weight maps.

GANs have also been applied to image fusion with promising results [30,31]. A GAN consists of two networks: a generator that creates images and a discriminator that evaluates them. For image fusion, the generator can be trained to produce fused images from input images, while the discriminator ensures that the fused images are indistinguishable from real images in terms of quality and information content. This approach can result in high-quality fused images that effectively blend the characteristics of both modalities. Change et al. [32] presented a GAN model incorporating dual fusion paths and a U-type discriminator. Experimental findings illustrate that this approach outperforms other methods.

Deep learning offers a powerful framework for image fusion, with the potential to significantly enhance the quality and usefulness of fused images across a wide range of applications. Ongoing research in this field focuses on developing more efficient, adaptable, and interpretable models that can provide even better fusion results.

2.2. Traditional Methods of Image Fusion

Traditional methods for image fusion focus on combining the complementary information from source images to enhance the visibility of features and improve the overall quality of the resulting image. These techniques are generally categorized via the domain in which the fusion takes place: transform- and spatial-domain methods [33,34,35,36,37].

In transform-domain methods, Chen et al. [38] introduced a spatial-frequency collaborative fusion framework for image fusion; this algorithm utilizes the properties of nonsubsampled shearlet transform for decomposition and reconstruction. Chen et al. [39] introduced a fusion approach tailored for image fusion, emphasizing edge consistency and correlation-driven integration. Through nonsubsampled shearlet transform decomposition, detail layers are acquired housing image details and textures alongside a base layer containing primary features. Li et al. [40] introduced the method for fusing infrared and visible images, leveraging low-pass filtering and sparse representation. Chen et al. [41] introduced the multi-focus image fusion with complex sparse representation (CSR); this model leverages the properties of hypercomplex signals to obtain directional information from real-valued signals by extending them to complex form. It then decomposes these directional components into sparse coefficients using specific directional dictionaries. Unlike traditional SR models, this approach excels at capturing geometric structures in images. This is because CSR coefficients offer accurate measurements of detailed information along particular directions.

In spatial domain methods, Li et al. [42] introduced a neural-network-based approach to assess focus properties using measures like spatial frequency, visibility, and edge features within the source image blocks.

3. Laplacian Pyramid Transform

The Laplacian pyramid of an image can be obtained by computing the difference between every two consecutive layers of the Gaussian pyramid [43,44,45]. Suppose

G_{0}

represents a matrix of an image, and

G_{k}

represents the

k th

layer of the Gaussian pyramid decomposition of the image. Similarly, the

k - 1 th

layer of the Gaussian pyramid is

G_{k - 1}

, where the 0th layer is the image itself. The definition of

G_{k}

is as follows [44]:

\begin{array}{l} G_{k} (i, j) = \sum_{m = - 2}^{2} \sum_{n = - 2}^{2} w (m, n) G_{k - 1} (2 i + m, 2 j + n) \\ (1 \leq k \leq N, 0 \leq i \leq R_{k}, 0 \leq j \leq C_{k}) \end{array}

(1)

where

N

is the maximum number of layers in the Gaussian pyramid decomposition;

R_{k}

and

C_{k}

represent the number of rows and columns of the

k th

layer image of the Gaussian pyramid, respectively.

w (m, n)

is a low-pass window function of size

5 \times 5

[44,45]:

w = \frac{1}{256} (\begin{array}{l} 1 4 6 4 1 \\ 4 16 24 16 4 \\ 6 24 36 24 6 \\ 4 16 24 16 4 \\ 1 4 6 4 1 \end{array})

(2)

To compute the difference between the

k th

layer image

G_{k}

and the

(k - 1) th

layer image

G_{k - 1}

in the Gaussian pyramid, it is necessary to upsample the low-resolution image

G_{k}

to match the size of the high-resolution image

G_{k - 1}

. Opposite to the process of image downsampling (Reduce), the operation defined for image upsampling is called Expand:

G_{k}^{*} = Expand (G_{k})

(3)

where

G_{k}^{*}

and

G_{k - 1}

have the same dimensions. The specific operation is achieved by interpolating and enlarging the

k th

layer image,

G_{k}

, as defined in Equation (3):

\begin{array}{l} G_{k}^{*} (i, j) = 4 \sum_{m = - 2}^{2} \sum_{n = - 2}^{2} w (m, n) G_{k}^{'} (\frac{i - m}{2}, \frac{j - n}{2}) \\ (1 \leq k \leq N, 0 \leq i \leq R_{k - 1}, 0 \leq j \leq C_{k - 1}) \end{array}

(4)

where

G_{k}^{'} (\frac{i - m}{2}, \frac{j - n}{2}) = \{\begin{cases} G_{k} (\frac{i - m}{2}, \frac{j - n}{2}), when \frac{i - m}{2}, \frac{j - n}{2} are integers \\ 0, else \end{cases}

(5)

From Equation (4), it can be inferred that the newly interpolated pixels between the original pixels are determined by the weighted average of the original pixel intensities.

At this point, the difference between the expanded

k th

image

G_{k}^{*}

and the

(k - 1) th

layer image

G_{k - 1}

in the pyramid can be obtained from the following equation:

{LP}_{k - 1} = G_{k - 1} - G_{k}^{*} = G_{k - 1} - Expand (G_{k})

(6)

The above expression generates the

(k - 1) th

level of the Laplacian pyramid. Since

G_{k}

is obtained from

G_{k - 1}

through low-pass filtering and downsampling, the details in

G_{k}

are significantly fewer than those in

G_{k - 1}

, so the detail information contained in the interpolated

G_{k}^{*}

of

G_{k}

will still be less than

G_{k - 1}

.

{LP}_{k - 1}

, as the difference between

G_{k}^{*}

and

G_{k - 1}

, also reflects the information difference between the two layers of images in the Gaussian pyramid

G_{k}

and

G_{k - 1}

. It contains the high-frequency detail information lost when

G_{k}

is obtained through the blurring and downsampling of

G_{k - 1}

.

The complete definition of the Laplacian pyramid is as follows:

\{\begin{cases} {LP}_{k} = G_{k} - Expand (G_{k + 1}), 0 \leq k \leq N \\ {LP}_{N} = G_{N}, k = N \end{cases}

(7)

Thus,

{LP}_{0}, {LP}_{1}, \dots {, LP}_{N}

can form the Laplacian pyramid of the image, where each layer is the difference between the corresponding layers of the Gaussian pyramid and its upsampled version. This process is akin to bandpass filtering; therefore, the Laplacian pyramid can also be referred to as bandpass tower decomposition.

The decomposition process of the Laplacian pyramid can be summarized into four steps: low-pass filtering, downsampling, interpolation, and bandpass filtering. Figure 1 shows the decomposition and reconstruction process of the Laplacian pyramid transform. A series of pyramid images obtained through Laplacian decomposition can be reconstructed into the original image through an inverse transformation process. Below, we derive the reconstruction method based on Equation (7):

\begin{matrix} G_{0} = {LP}_{0} + Expand (G_{1}) \\ G_{1} = {LP}_{1} + Expand (G_{2}) \\ G_{N - 1} = {LP}_{N - 1} + Expand (G_{N}) \\ G_{N} = {LP}_{N} \end{matrix}

(8)

In summary, the reconstruction formula for the Laplacian pyramid can be expressed as

\{\begin{cases} G_{N} = {LP}_{N}, k = N \\ G_{k} = {LP}_{k} + Expand (G_{k + 1}), 0 \leq k < N \end{cases}

(9)

4. Proposed Fusion Method

In this section, we present a technique for fusing infrared and visible images using sparse representation and guided filtering within the Laplacian pyramid framework. The method involves four main stages: image decomposition, low-frequency fusion, high-frequency fusion, and image reconstruction. The structure of the proposed method is shown in Figure 2.

4.1. Image Decomposition

The original image undergoes decomposition into a Laplacian pyramid (LP), yielding a low-frequency band

L P_{N}

and a series of high-frequency bands. This LP transform is applied separately to the source images A and B, resulting in

L A_{k}

and

L B_{k}

, which represent the

k th

layer of the source images. When

k = N

,

L A_{N}

and

L B_{N}

are the decomposed top-level images (i.e., low-frequency information).

4.2. Low-Frequency Fusion

The low-frequency band effectively encapsulates the general structure and energy of the image. Sparse representation [1] has demonstrated efficacy in image fusion tasks; hence, it is employed to process the low-frequency band.

The sliding window technique is used to partition

L A_{N}

and

L B_{N}

into image patches with the size

\sqrt{n} \times \sqrt{n}

, from upper left to lower right, with the step length of

s

pixels. Let us denote that there are

T

patches represented as

{\{p_{A}^{i}\}}_{i = 1}^{T}

and

{\{p_{B}^{i}\}}_{i = 1}^{T}

in

L A_{N}

and

L B_{N}

, respectively.

For each position

i

, rearrange

\{p_{A}^{i}, p_{B}^{i}\}

into column vectors

\{v_{A}^{i}, v_{B}^{i}\}

, and then normalize each vector’s mean value to zero to generate

\{{\hat{V}}_{A}^{i}, {\hat{V}}_{B}^{i}\}

using the following equations [1]:

{\hat{V}}_{A}^{i} = V_{A}^{i} - {\bar{v}}_{A}^{i} \cdot 1

(10)

{\hat{V}}_{B}^{i} = V_{B}^{i} - {\bar{v}}_{B}^{i} \cdot 1

(11)

where 1 depicts an all-one valued

n \times 1

vector, and

{\hat{v}}_{A}^{i}

and

{\hat{v}}_{B}^{i}

are the mean values of all the elements in

V_{A}^{i}

and

V_{B}^{i}

, respectively.

To compute the sparse coefficient vectors

\{α_{A}^{i}, α_{B}^{i}\}

of

\{{\hat{V}}_{A}^{i}, {\hat{V}}_{B}^{i}\}

, we employ the orthogonal matching pursuit (OMP) technique, applying the following formulas:

α_{A}^{i} = \arg \min_{α} {‖α‖}_{0} s . t . {‖{\hat{V}}_{A}^{i} - D α‖}_{2} < ε

(12)

α_{B}^{i} = \arg \min_{α} {‖α‖}_{0} s . t . {‖{\hat{V}}_{B}^{i} - D α‖}_{2} < ε

(13)

Here,

D

represents the learned dictionary obtained through the K-singular value decomposition (K-SVD) approach.

Next,

α_{A}^{i}

and

α_{B}^{i}

are combined using the “max-L1” rule to produce the fused sparse vector:

α_{F}^{i} = \{\begin{cases} α_{A}^{i} i f {‖α_{A}^{i}‖}_{1} > {‖α_{B}^{i}‖}_{1} \\ α_{B}^{i} e l s e \end{cases}

(14)

The fused results of

V_{A}^{i}

and

V_{B}^{i}

can be calculated using the following method:

V_{F}^{i} = D α_{F}^{i} + {\hat{v}}_{F}^{i} \cdot 1

(15)

where the merged mean value

{\bar{v}}_{F}^{i}

can be computed as follows:

{\bar{v}}_{F}^{i} = \{\begin{cases} {\bar{v}}_{A}^{i} i f α_{F}^{i} = α_{A}^{i} \\ {\bar{v}}_{B}^{i} e l s e \end{cases}

(16)

The above process is iterated for all the source image patches in

{\{p_{A}^{i}\}}_{i = 1}^{T}

and

{\{p_{B}^{i}\}}_{i = 1}^{T}

to generate all fused vectors

{\{V_{F}^{i}\}}_{i = 1}^{T}

. Let

L F_{N}

denotes the low-pass fused result. For each

V_{F}^{i}

, reshape it into a patch

p_{F}^{i}

, and then plug

p_{F}^{i}

into its original position in

L F_{N}

. As the patches are overlapped, each pixel’s value in

L F_{N}

is averaged over its accumulation times.

4.3. High-Frequency Fusion

The high-frequency bands contain detailed information. The activity level measure, named WSEML, is defined as follows [46]:

\begin{array}{l} {WSEML}_{S} (i, j) = \sum_{m = - r}^{r} \sum_{n = - r}^{r} W (m + r + 1, n + r + 1) \\ \times {EML}_{S} (i + m, j + n) \end{array}

(17)

where

S \in \{L A_{k}, L B_{k}\}

, the

3 \times 3

normalized model of

W

, is defined as follows:

W = \frac{1}{16} [\begin{array}{l} 1 2 1 \\ 2 4 2 \\ 1 2 1 \end{array}]

(18)

and the

E M L_{S}

is computed by

\begin{array}{l} {EML}_{S} (i, j) \\ = |2 S (i, j) - S (i - 1, j) - S (i + 1, j)| \\ + |2 S (i, j) - S (i, j - 1) - S (i, j + 1)| \\ + \frac{1}{\sqrt{2}} |2 S (i, j) - S (i - 1, j - 1) - S (i + 1, j + 1)| \\ + \frac{1}{\sqrt{2}} |2 S (i, j) - S (i - 1, j + 1) - S (i + 1, j - 1)| \end{array}

(19)

The two zero-value matrixes mapA and mapB are initialized, and the matrixes are computed by

m a p A (i, j) = \{\begin{cases} 1, {if WSEML}_{L A_{k}} (i, j) \geq {WSEML}_{L B_{k}} (i, j) \\ 0, else \end{cases} 0 \leq k < N

(20)

m a p B (i, j) = 1 - m a p A (i, j)

(21)

Guided filtering, denoted as

G_{r, ε} (p, I)

, is a linear filtering technique [47,48]. Here, the parameters that control the size of the filter kernel and the extent of blur are represented by

r

and

ε

, respectively.

p

and

I

depict the input image and guidance image, respectively. To enhance the spatial continuity of the high-pass bands in the context of using guided filtering on mapA and mapB, we utilize the corresponding high-pass bands

L A_{k}

and

L B_{k}

as the guidance images.

m a p A = G_{r, ε} (m a p A, L A_{l})

(22)

m a p B = G_{r, ε} (m a p B, L B_{l})

(23)

where mapA and mapB should be normalized, and the fused high-pass bands

L F_{k} (i, j)

are calculated by

L F_{k} (i, j) = m a p A \times L A_{k} + m a p B \times L B_{k}, 0 \leq k < N

(24)

4.4. Image Reconstruction

Perform the corresponding inverse LP to reconstruct the final fused image.

5. Experimental Results and Discussion

5.1. Experimental Setup

In this section, we conducted simulation experiments using the TNO public dataset [49] and compared them through qualitative and quantitative evaluations. Figure 3 shows the examples from the TNO dataset. We compared our algorithm with eight other image fusion algorithms, namely, ICA [50], ADKLT [51], MFSD [52], MDLatLRR [53], PMGI [54], RFNNest [55], EgeFusion [56], and LEDIF [57]. For quantitative evaluation, we adopted 10 commonly used evaluation metrics to assess the effectiveness of the algorithm, namely, the edge-based similarity measurement

Q_{A B / F}

[58,59,60,61,62,63], the human-perception-inspired metric

Q_{C B}

[64,65], the structural-similarity-based metric

Q_{E}

[64], the feature mutual information metric

Q_{F M I}

[66], the gradient-based metric

Q_{G}

[64], the mutual information metric

Q_{M I}

[58,67], the nonlinear correlation information entropy

Q_{N C I E}

[64], the normalized mutual information

Q_{N M I}

[64], the phase-congruency-based metric

Q_{P}

[64], and the structural-similarity-based metric introduced by Yang et al.

Q_{Y}

[64,68,69].

Q_{A B / F}

computes and measures the amount of edge information transferred from the source images to the fused images using a Sobel edge detector.

Q_{C B}

is a perceptual-fusion metric based on human visual system (HVS) models.

Q_{E}

takes the original images and the edge images into consideration at the same time.

Q_{F M I}

calculates the regional mutual information between corresponding windows in the fused image and the two source images.

Q_{G}

is obtained from the weighted average of the edge information preservation values.

Q_{M I}

computes how much information from the source images is transferred to the fused image.

Q_{N C I E}

is an information-theory-based metric.

Q_{N M I}

is a quantitative measure of the mutual dependence of two variables.

Q_{P}

provides an absolute measure of image features.

Q_{Y}

is a fusion metric based on SSIM. A higher index value indicates the algorithm’s superiority.

The parameters for the compared algorithms correspond to the default parameters in the respective articles. For our method, the parameters are as follows:

r = 3

,

ε = 10^{- 6}

; the dictionary size is 256, with K-SVD iterated 180 times. Patch size is 6 × 6, step length is 1, and error tolerance is 0.1 [1].

5.2. Analysis of LP Decomposition Levels

Figure 4 shows the fusion results of LP with different decomposition levels. From the figure, it can be observed that the fusion effects in Figure 4a–c are poor, with severe artifacts. The fusion results in Figure 4d–f are relatively similar. Table 1 provides evaluation metrics for 42 image pairs under different LP decomposition levels. Since the fusion results are poor for decomposition levels 1–3, we first exclude these settings. Comparing the average metric values for decomposition levels 4–6, we see that at level 4, five metrics are optimal. Therefore, we set the LP decomposition level to 4.

5.3. Qualitative and Quantitative Analysis

Figure 5 illustrates the fusion outcomes of various methods applied to Data 1 alongside the corresponding metric data in Table 2. The ICA, ADKLT, PMGI, and RFNNest methods are observed to produce fused images that appear blurred, failing to maintain the thermal radiation characteristics and details present in the source images. Both MFSD and LEDIF methods yield similar fusion results, preserving human thermal radiation characteristics but suffering from noticeable loss of brightness information in specific areas. Conversely, the MDLatLRR and EgeFusion algorithms demonstrate over-sharpening effects, leading to artifacts and significant distortion in the fused images. Our algorithm enables comprehensive complementarity between the infrared and visible images while fully preserving the thermal infrared characteristics.

From Table 2, it can be observed that our algorithm achieves optimal objective metrics on Data 1, with a

Q_{A B / F}

value of 0.5860,

Q_{C B}

value of 0.6029,

Q_{E}

value of 0.7047,

Q_{F M I}

value of 0.9248,

Q_{G}

value of 0.5838,

Q_{M I}

value of 2.7156,

Q_{N C I E}

value of 0.8067,

Q_{N M I}

value of 0.3908,

Q_{P}

value of 0.3280, and

Q_{Y}

value of 0.8802.

Figure 6 displays the fusion results of various methods applied to Data 2, along with the corresponding metric data shown in Table 3. Observing the fusion results, it is evident that the ICA, ADKLT, and PMGI algorithms produced fused images that are blurred and exhibit low brightness. The MFSD, RFNNest, and LEDIF methods suffered from some loss of thermal radiation information. In contrast, the MDLatLRR and EgeFusion algorithms resulted in sharpened images, enhancing the human subjects but potentially causing distortion in other areas due to the sharpening effect. Our algorithm achieved the best fusion result.

From Table 3, it is apparent that our algorithm achieved superior objective metrics on Data 2, with a

Q_{A B / F}

value of 0.6880,

Q_{C B}

value of 0.6771,

Q_{E}

value of 0.7431,

Q_{F M I}

value of 0.9623,

Q_{G}

value of 0.6860,

Q_{M I}

value of 3.6399,

Q_{N C I E}

value of 0.8112,

Q_{N M I}

value of 0.5043,

Q_{P}

value of 0.2976, and

Q_{Y}

value of 0.9458.

Figure 7 depicts the fusion results of various methods applied to Data 3, accompanied by the corresponding metric data shown in Table 4. Analyzing the fusion outcomes, it is evident that the ICA and ADKLT algorithms produced blurry fused images with significant loss of information. The MFSD method introduced artifacts in certain regions. While the MDLatLRR and EgeFusion algorithms increased the overall brightness, they also introduced artifacts. The PMGI and RFNNest algorithms resulted in distorted fused images. The LEDIF algorithm achieved commendable fusion results, albeit with some artifacts present. Our algorithm yielded the best fusion result, achieving moderate brightness and preserving the thermal radiation characteristics.

From Table 4, it is apparent that our algorithm attained optimal objective metrics on Data 3, with a

Q_{A B / F}

value of 0.7252,

Q_{C B}

value of 0.6830,

Q_{E}

value of 0.8105,

Q_{F M I}

value of 0.8887,

Q_{G}

value of 0.7182,

Q_{M I}

value of 4.4156,

Q_{N C I E}

value of 0.8131,

Q_{N M I}

value of 0.6674,

Q_{P}

value of 0.8141, and

Q_{Y}

value of 0.9395.

Figure 8 displays the fusion results of various methods applied to Data 4, alongside the corresponding metric data shown in Table 5. Upon reviewing the fusion outcomes, it is evident that the fusion images produced by the ICA, ADKLT, MFSD, PMGI, and LEDIF algorithms exhibit some loss of brightness information. The MDLatLRR and EgeFusion algorithms sharpened the fused image, while the RFNNest method resulted in a darker fused image with some information loss. In contrast, our algorithm produced a fused image with complementary information.

From Table 5, it is notable that our algorithm achieved optimal objective metrics on Data 4, with a

Q_{A B / F}

value of 0.5947,

Q_{C B}

value of 0.5076,

Q_{E}

value of 0.6975,

Q_{F M I}

value of 0.9059,

Q_{G}

value of 0.5915,

Q_{M I}

value of 2.5337,

Q_{N C I E}

value of 0.8062,

Q_{N M I}

value of 0.3571,

Q_{P}

value of 0.5059, and

Q_{Y}

value of 0.8553.

Figure 9 provides detailed insights into the objective performance of the various fusion methods across 42 pairs of data from the TNO dataset. The horizontal axis represents the number of data pairs used in our experiments, while the vertical axis represents the metric values. Each method’s scores across different source images are plotted as curves, with the average score indicated in the legend. Figure 9 illustrates that most methods show consistent trends across the metrics examined, and nearly all fusion methods demonstrate stable performance across all test images, with few exceptions. Therefore, comparisons based on average values in Table 6 hold significant value.

5.4. Experimental Expansion

We expanded our proposed algorithm to include the fusion of multi-focus images from the Lytro [70] and MFI-WHU datasets [71], selecting 20 and 30 groups of data for testing, respectively. The simulation results for one of the data groups are shown in Figure 10. This extension involved a comparative evaluation against eight methods: ICA [50], FusionDN [72], PMGI [54], U2Fusion [73], LEGFF [74], ZMFF [75], EgeFusion [56], and LEDIF [57]. The assessment utilized both subjective visual inspection and objective metrics. Figure 11 and Figure 12 provide detailed insights into the objective performance of various fusion methods on the Lytro and MFI-WHU datasets, with the corresponding average metric values shown in Table 7 and Table 8. From the results in Figure 10, it is evident that the ICA and PMGI algorithms tended to produce fused images with noticeable blurriness, impacting the clarity of detailed information within the fused images. The fused images produced by the FusionDN and U2Fusion algorithms exhibited dark regions in specific areas, such as hair regions in portraits, which detracted from overall visual quality. The fusion results of the LEGFF, ZMFF, and LEDIF algorithms are quite similar, all achieving fully focused fusion effects. The fused image generated by the EgeFusion algorithm showed distortions that made it challenging to discern detailed parts of the image. Our algorithm demonstrated promising results both visually and quantitatively when compared with the other algorithms. Subjective visual assessment indicated that our method effectively enhanced the presentation of complementary information in the fused images, preserving clarity and detail across different focus levels.

6. Conclusions

To enhance the clarity and thermal radiation fidelity of infrared and visible image fusion, a fusion method based on sparse representation and guided filtering in the Laplacian pyramid domain is introduced. The Laplacian pyramid serves as an efficient multi-scale transform that decomposes the original image into distinct low- and high-frequency components. Low-frequency bands, crucial for capturing overall scene structure and thermal characteristics, are processed using the sparse representation technique. Sparse representation ensures that key features are preserved while reducing noise and maintaining thermal radiation attributes. High-frequency bands, which encompass fine details and textures vital for visual clarity, are enhanced using guided filtering integrated with WSEML. This approach successfully combines the contextual details from the source images, ensuring that the fused output maintains sharpness and fidelity across different scales. We carried out thorough simulation tests using the well-known TNO dataset to assess the performance of our algorithm. The results demonstrate that our method successfully preserves thermal radiation characteristics while enhancing scene details in the fused images. By continuing to innovate within the framework of sparse representation and guided filtering in the Laplacian pyramid domain, we aim to contribute significantly to the advancement of image fusion techniques, particularly in scenarios where preserving thermal characteristics and enhancing visual clarity are paramount. Moreover, we extended our approach to conducting fusion experiments on multi-focus images, achieving satisfactory results in capturing diverse focal points within a single fused output.

In our future research, we plan to further refine and expand our algorithm’s capabilities. Specifically, we aim to explore enhancements tailored for the fusion of synthetic aperture radar (SAR) and optical images [76]. By integrating SAR data, which provide unique insights into surface properties and structures, with optical imagery, which offers high-resolution contextual information, we anticipate developing a robust fusion framework capable of addressing diverse application scenarios effectively. Additionally, research on change detection based on fusion models is also one of our future research directions [77,78,79,80].

Author Contributions

The experimental measurements and data collection were carried out by L.L., Y.S., M.L. (Ming Lv), Z.J., M.L. (Minqin Liu), X.Z. (Xiaobin Zhao), X.Z. (Xueyu Zhang), and H.M. The manuscript was written by L.L. with the assistance of Y.S., M.L. (Ming Lv), Z.J., M.L. (Minqin Liu), X.Z. (Xiaobin Zhao), X.Z. (Xueyu Zhang), and H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant Nos. 92152109 and 62261053; the Technology Innovation Program of Beijing Institute of Technology under Grant No. 2024CX02065; the Cross-Media Intelligent Technology Project of Beijing National Research Center for Information Science and Technology (BNRist) under Grant No. BNR2019TD01022; and the Tianshan Talent Training Project-Xinjiang Science and Technology Innovation Team Program (2023TSYCTD0012).

Data Availability Statement

The TNO dataset can be accessed via the following link: https://figshare.com/articles/dataset/TNO_Image_Fusion_Dataset/1008029 (accessed on 2 July 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Y.; Liu, S.; Wang, Z. A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 2015, 24, 147–164. [Google Scholar] [CrossRef]
Huo, X.; Deng, Y.; Shao, K. Infrared and visible image fusion with significant target enhancement. Entropy 2022, 24, 1633. [Google Scholar] [CrossRef]
Luo, Y.; Luo, Z. Infrared and visible image fusion: Methods, datasets, applications, and prospects. Appl. Sci. 2023, 13, 10891. [Google Scholar] [CrossRef]
Li, L.; Lv, M.; Jia, Z.; Jin, Q.; Liu, M.; Chen, L.; Ma, H. An effective infrared and visible image fusion approach via rolling guidance filtering and gradient saliency map. Remote Sens. 2023, 15, 2486. [Google Scholar] [CrossRef]
Ma, X.; Li, T.; Deng, J. Infrared and visible image fusion algorithm based on double-domain transform filter and contrast transform feature extraction. Sensors 2024, 24, 3949. [Google Scholar] [CrossRef]
Wang, Q.; Yan, X.; Xie, W.; Wang, Y. Image fusion method based on snake visual imaging mechanism and PCNN. Sensors 2024, 24, 3077. [Google Scholar] [CrossRef]
Feng, B.; Ai, C.; Zhang, H. Fusion of infrared and visible light images based on improved adaptive dual-channel pulse coupled neural network. Electronics 2024, 13, 2337. [Google Scholar] [CrossRef]
Yang, H.; Zhang, J.; Zhang, X. Injected infrared and visible image fusion via L₁ decomposition model and guided filtering. IEEE Trans. Comput. Imaging 2022, 8, 162–173. [Google Scholar]
Zhang, X.; Boutat, D.; Liu, D. Applications of fractional operator in image processing and stability of control systems. Fractal Fract. 2023, 7, 359. [Google Scholar] [CrossRef]
Zhang, X.; He, H.; Zhang, J. Multi-focus image fusion based on fractional order differentiation and closed image matting. ISA Trans. 2022, 129, 703–714. [Google Scholar] [CrossRef]
Zhang, X.; Yan, H. Medical image fusion and noise suppression with fractional-order total variation and multi-scale decomposition. IET Image Process. 2021, 15, 1688–1701. [Google Scholar] [CrossRef]
Yan, H.; Zhang, X. Adaptive fractional multi-scale edge-preserving decomposition and saliency detection fusion algorithm. ISA Trans. 2020, 107, 160–172. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Yan, H.; He, H. Multi-focus image fusion based on fractional-order derivative and intuitionistic fuzzy sets. Front. Inf. Technol. Electron. Eng. 2020, 21, 834–843. [Google Scholar] [CrossRef]
Zhang, J.; Ding, J.; Chai, T. Fault-tolerant prescribed performance control of wheeled mobile robots: A mixed-gain adaption approach. IEEE Trans. Autom. Control 2024, 69, 5500–5507. [Google Scholar] [CrossRef]
Zhang, J.; Xu, K.; Wang, Q. Prescribed performance tracking control of time-delay nonlinear systems with output constraints. IEEE/CAA J. Autom. Sin. 2024, 11, 1557–1565. [Google Scholar] [CrossRef]
Wu, D.; Wang, Y.; Wang, H.; Wang, F.; Gao, G. DCFNet: Infrared and visible image fusion network based on discrete wavelet transform and convolutional neural network. Sensors 2024, 24, 4065. [Google Scholar] [CrossRef]
Wei, Q.; Liu, Y.; Jiang, X.; Zhang, B.; Su, Q.; Yu, M. DDFNet-A: Attention-based dual-branch feature decomposition fusion network for infrared and visible image fusion. Remote Sens. 2024, 16, 1795. [Google Scholar] [CrossRef]
Li, X.; He, H.; Shi, J. HDCCT: Hybrid densely connected CNN and transformer for infrared and visible image fusion. Electronics 2024, 13, 3470. [Google Scholar] [CrossRef]
Mao, Q.; Zhai, W.; Lei, X.; Wang, Z.; Liang, Y. CT and MRI image fusion via coupled feature-learning GAN. Electronics 2024, 13, 3491. [Google Scholar] [CrossRef]
Wang, Z.; Chen, Y.; Shao, W. SwinFuse: A residual swin transformer fusion network for infrared and visible images. IEEE Trans. Instrum. Meas. 2023, 71, 5016412. [Google Scholar] [CrossRef]
Ma, J.; Tang, L.; Fan, F. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE-CAA J. Autom. Sin. 2022, 9, 1200–1217. [Google Scholar] [CrossRef]
Gao, F.; Lang, P.; Yeh, C.; Li, Z.; Ren, D.; Yang, J. An interpretable target-aware vision transformer for polarimetric HRRP target recognition with a novel attention loss. Remote Sens. 2024, 16, 3135. [Google Scholar] [CrossRef]
Huang, L.; Chen, Y.; He, X. Spectral-spatial Mamba for hyperspectral image classification. Remote Sens. 2024, 16, 2449. [Google Scholar] [CrossRef]
Zhang, X.; Demiris, Y. Visible and infrared image fusion using deep learning. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10535–10554. [Google Scholar] [CrossRef]
Zhang, X.; Ye, P.; Xiao, G. VIFB: A visible and infrared image fusion benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Li, H.; Wu, X. CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach. Inf. Fusion 2024, 103, 102147. [Google Scholar] [CrossRef]
Liu, Y.; Chen, X.; Wang, Z. Deep learning for pixel-level image fusion: Recent advances and future prospects. Inf. Fusion 2018, 42, 158–173. [Google Scholar] [CrossRef]
Liu, Y.; Chen, X.; Cheng, J. Infrared and visible image fusion with convolutional neural networks. Int. J. Wavelets Multiresolut. Inf. Process. 2018, 16, 1850018. [Google Scholar] [CrossRef]
Yang, C.; He, Y. Multi-scale convolutional neural networks and saliency weight maps for infrared and visible image fusion. J. Vis. Commun. Image Represent. 2024, 98, 104015. [Google Scholar] [CrossRef]
Wei, H.; Fu, X.; Wang, Z.; Zhao, J. Infrared/Visible light fire image fusion method based on generative adversarial network of wavelet-guided pooling vision transformer. Forests 2024, 15, 976. [Google Scholar] [CrossRef]
Ma, J.; Xu, H. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans. Image Process. 2020, 29, 4980–4995. [Google Scholar] [CrossRef]
Chang, L.; Huang, Y. DUGAN: Infrared and visible image fusion based on dual fusion paths and a U-type discriminator. Neurocomputing 2024, 578, 127391. [Google Scholar] [CrossRef]
Lv, M.; Jia, Z.; Li, L.; Ma, H. Multi-focus image fusion via PAPCNN and fractal dimension in NSST domain. Mathematics 2023, 11, 3803. [Google Scholar] [CrossRef]
Lv, M.; Li, L.; Jin, Q.; Jia, Z.; Chen, L.; Ma, H. Multi-focus image fusion via distance-weighted regional energy and structure tensor in NSCT domain. Sensors 2023, 23, 6135. [Google Scholar] [CrossRef]
Li, L.; Lv, M.; Jia, Z.; Ma, H. Sparse representation-based multi-focus image fusion method via local energy in shearlet domain. Sensors 2023, 23, 2888. [Google Scholar] [CrossRef]
Ma, J.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Inf. Fusion 2019, 45, 153–178. [Google Scholar] [CrossRef]
Liu, Y.; Wang, L.; Cheng, J. Multi-focus image fusion: A survey of the state of the art. Inf. Fusion 2020, 64, 71–91. [Google Scholar] [CrossRef]
Chen, H.; Deng, L. SFCFusion: Spatial-frequency collaborative infrared and visible image fusion. IEEE Trans. Instrum. Meas. 2024, 73, 5011615. [Google Scholar] [CrossRef]
Chen, H.; Deng, L.; Zhu, L.; Dong, M. ECFuse: Edge-consistent and correlation-driven fusion framework for infrared and visible image fusion. Sensors 2023, 23, 8071. [Google Scholar] [CrossRef]
Li, X.; Tan, H. Infrared and visible image fusion based on domain transform filtering and sparse representation. Infrared Phys. Technol. 2023, 131, 104701. [Google Scholar] [CrossRef]
Chen, Y.; Liu, Y. Multi-focus image fusion with complex sparse representation. IEEE Sens. J. 2024; early access. [Google Scholar]
Li, S.; Kwok, J.T.; Wang, Y. Multifocus image fusion using artificial neural networks. Pattern Recognit. Lett. 2002, 23, 985–997. [Google Scholar] [CrossRef]
Chang, C.I.; Liang, C.C.; Hu, P.F. Iterative Gaussian–Laplacian pyramid network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5510122. [Google Scholar] [CrossRef]
Burt, P.J.; Adelson, E.H. The laplacian pyramid as a compact image code. IEEE Trans. Commun. 1983, 31, 532–540. [Google Scholar] [CrossRef]
Chen, J.; Li, X.; Luo, L. Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf. Sci. 2020, 508, 64–78. [Google Scholar] [CrossRef]
Yin, M.; Liu, X.; Liu, Y. Medical image fusion with parameter-adaptive pulse coupled neural network in nonsubsampled shearlet transform domain. IEEE Trans. Instrum. Meas. 2019, 68, 49–64. [Google Scholar] [CrossRef]
He, K.; Sun, J.; Tang, X. Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1397–1409. [Google Scholar] [CrossRef]
Li, S.; Kang, X.; Hu, J. Image fusion with guided filtering. IEEE Trans. Image Process. 2013, 22, 2864–2875. [Google Scholar]
Available online: https://figshare.com/articles/dataset/TNO_Image_Fusion_Dataset/1008029 (accessed on 1 May 2024).
Mitianoudis, N.; Stathaki, T. Pixel-based and region-based image fusion schemes using ICA bases. Inf. Fusion 2007, 8, 131–142. [Google Scholar] [CrossRef]
Bavirisetti, D.P.; Dhuli, R. Fusion of infrared and visible sensor images based on anisotropic diffusion and Karhunen-Loeve transform. IEEE Sens. J. 2016, 16, 203–209. [Google Scholar] [CrossRef]
Bavirisetti, D.P.; Dhuli, R. Two-scale image fusion of visible and infrared images using saliency detection. Infrared Phys. Technol. 2016, 76, 52–64. [Google Scholar] [CrossRef]
Li, H.; Wu, X.; Kittler, J. MDLatLRR: A novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 2020, 29, 4733–4746. [Google Scholar] [CrossRef]
Zhang, H.; Xu, H.; Xiao, Y. Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12797–12804. [Google Scholar]
Li, H.; Wu, X.; Kittler, J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Inf. Fusion 2021, 73, 72–86. [Google Scholar] [CrossRef]
Tang, H.; Liu, G. EgeFusion: Towards edge gradient enhancement in infrared and visible image fusion with multi-scale transform. IEEE Trans. Comput. Imaging 2024, 10, 385–398. [Google Scholar] [CrossRef]
Xiang, W.; Shen, J.; Zhang, L.; Zhang, Y. Infrared and visual image fusion based on a local-extrema-driven image filter. Sensors 2024, 24, 2271. [Google Scholar] [CrossRef] [PubMed]
Qu, X.; Yan, J.; Xiao, H. Image fusion algorithm based on spatial frequency-motivated pulse coupled neural networks in nonsubsampled contourlet transform domain. Acta Autom. Sin. 2008, 34, 1508–1514. [Google Scholar] [CrossRef]
Li, S.; Han, M.; Qin, Y.; Li, Q. Self-attention progressive network for infrared and visible image fusion. Remote Sens. 2024, 16, 3370. [Google Scholar] [CrossRef]
Li, L.; Zhao, X.; Hou, H.; Zhang, X.; Lv, M.; Jia, Z.; Ma, H. Fractal dimension-based multi-focus image fusion via coupled neural P systems in NSCT domain. Fractal Fract. 2024, 8, 554. [Google Scholar] [CrossRef]
Zhai, H.; Ouyang, Y.; Luo, N. MSI-DTrans: A multi-focus image fusion using multilayer semantic interaction and dynamic transformer. Displays 2024, 85, 102837. [Google Scholar] [CrossRef]
Li, L.; Ma, H.; Jia, Z.; Si, Y. A novel multiscale transform decomposition based multi-focus image fusion framework. Multimed. Tools Appl. 2021, 80, 12389–12409. [Google Scholar] [CrossRef]
Li, B.; Zhang, L.; Liu, J.; Peng, H. Multi-focus image fusion with parameter adaptive dual channel dynamic threshold neural P systems. Neural Netw. 2024, 179, 106603. [Google Scholar] [CrossRef]
Liu, Z.; Blasch, E.; Xue, Z. Objective assessment of multiresolution image fusion algorithms for context enhancement in night vision: A comparative study. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 94–109. [Google Scholar] [CrossRef]
Zhai, H.; Chen, Y.; Wang, Y. W-shaped network combined with dual transformers and edge protection for multi-focus image fusion. Image Vis. Comput. 2024, 150, 105210. [Google Scholar] [CrossRef]
Haghighat, M.; Razian, M. Fast-FMI: Non-reference image fusion metric. In Proceedings of the IEEE 8th International Conference on Application of Information and Communication Technologies, Astana, Kazakhstan, 15–17 October 2014; pp. 424–426. [Google Scholar]
Wang, X.; Fang, L.; Zhao, J.; Pan, Z.; Li, H.; Li, Y. MMAE: A universal image fusion method via mask attention mechanism. Pattern Recognit. 2025, 158, 111041. [Google Scholar] [CrossRef]
Zhang, X.; Li, W. Hyperspectral pathology image classification using dimension-driven multi-path attention residual network. Expert Syst. Appl. 2023, 230, 120615. [Google Scholar] [CrossRef]
Zhang, X.; Li, Q. FD-Net: Feature distillation network for oral squamous cell carcinoma lymph node segmentation in hyperspectral imagery. IEEE J. Biomed. Health Inform. 2024, 28, 1552–1563. [Google Scholar] [CrossRef]
Nejati, M.; Samavi, S.; Shirani, S. Multi-focus image fusion using dictionary-based sparse representation. Inf. Fusion 2015, 25, 72–84. [Google Scholar] [CrossRef]
Zhang, H.; Le, Z. MFF-GAN: An unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion. Inf. Fusion 2021, 66, 40–53. [Google Scholar] [CrossRef]
Xu, H.; Ma, J.; Le, Z. FusionDN: A unified densely connected network for image fusion. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12484–12491. [Google Scholar]
Xu, H.; Ma, J.; Jiang, J. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 502–518. [Google Scholar] [CrossRef]
Zhang, Y.; Xiang, W. Local extreme map guided multi-modal brain image fusion. Front. Neurosci. 2022, 16, 1055451. [Google Scholar] [CrossRef]
Hu, X.; Jiang, J.; Liu, X.; Ma, J. ZMFF: Zero-shot multi-focus image fusion. Inf. Fusion 2023, 92, 127–138. [Google Scholar] [CrossRef]
Li, J.; Zhang, J.; Yang, C.; Liu, H.; Zhao, Y.; Ye, Y. Comparative analysis of pixel-level fusion algorithms and a new high-resolution dataset for SAR and optical image fusion. Remote Sens. 2023, 15, 5514. [Google Scholar] [CrossRef]
Li, L.; Ma, H.; Jia, Z. Multiscale geometric analysis fusion-based unsupervised change detection in remote sensing images via FLICM model. Entropy 2022, 24, 291. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Ma, H.; Zhang, X.; Zhao, X.; Lv, M.; Jia, Z. Synthetic aperture radar image change detection based on principal component analysis and two-level clustering. Remote Sens. 2024, 16, 1861. [Google Scholar] [CrossRef]
Li, L.; Ma, H.; Jia, Z. Change detection from SAR images based on convolutional neural networks guided by saliency enhancement. Remote Sens. 2021, 13, 3697. [Google Scholar] [CrossRef]
Li, L.; Ma, H.; Jia, Z. Gamma correction-based automatic unsupervised change detection in SAR images via FLICM model. J. Indian Soc. Remote Sens. 2023, 51, 1077–1088. [Google Scholar] [CrossRef]

Figure 1. Laplacian pyramid. (a) Three-level Laplacian pyramid decomposition diagram; (b) Three-level Laplacian reconstruction diagram.

Figure 2. The structure of the proposed method.

Figure 3. Examples from the TNO dataset.

Figure 4. Fusion results of different decomposition levels in LP. (a) 1 level; (b) 2 level; (c) 3 level; (d) 4 level; (e) 5 level; (f) 6 level.

Figure 5. Results on Data 1. (a) ICA; (b) ADKLT; (c) MFSD; (d) MDLatLRR; (e) PMGI; (f) RFNNest; (g) EgeFusion; (h) LEDIF; (i) Proposed.

Figure 6. Results on Data 2. (a) ICA; (b) ADKLT; (c) MFSD; (d) MDLatLRR; (e) PMGI; (f) RFNNest; (g) EgeFusion; (h) LEDIF; (i) Proposed.

Figure 7. Results on Data 3. (a) ICA; (b) ADKLT; (c) MFSD; (d) MDLatLRR; (e) PMGI; (f) RFNNest; (g) EgeFusion; (h) LEDIF; (i) Proposed.

Figure 8. Results on Data 4. (a) ICA; (b) ADKLT; (c) MFSD; (d) MDLatLRR; (e) PMGI; (f) RFNNest; (g) EgeFusion; (h) LEDIF; (i) Proposed.

Figure 9. Objective performance of different methods on the TNO dataset.

Figure 10. Results on Lytro-01. (a) Near focus; (b) Far focus; (c) ICA; (d) FusionDN; (e) PMGI; (f) U2Fusion; (g) LEGFF; (h) ZMFF; (i) EgeFusion; (j) LEDIF; (k) Proposed.

Figure 11. Objective performance of different methods on the Lytro dataset.

Figure 12. Objective performance of different methods on the MFI-WHU dataset.

Table 1. The average objective evaluation of different LP decomposition levels on 42 pairs of data from the TNO dataset.

Levels	$Q_{A B / F}$	$Q_{C B}$	$Q_{E}$	$Q_{F M I}$	$Q_{G}$	$Q_{M I}$	$Q_{N C I E}$	$Q_{N M I}$	$Q_{P}$	$Q_{Y}$
1	0.5686	0.5392	0.6205	0.9068	0.5592	3.8195	0.8155	0.5440	0.3016	0.8307
2	0.5669	0.5467	0.6655	0.9124	0.5565	3.1350	0.8099	0.4438	0.3402	0.8317
3	0.5727	0.5394	0.6764	0.9138	0.5619	2.6628	0.8075	0.3760	0.3644	0.8301
4	0.5768	0.5306	0.6699	0.9140	0.5654	2.4378	0.8065	0.3460	0.3716	0.8233
5	0.5765	0.5131	0.6521	0.9138	0.5655	2.3160	0.8060	0.3321	0.3832	0.8079
6	0.5775	0.5113	0.6292	0.9133	0.5662	2.4575	0.8064	0.3540	0.3871	0.7980

Table 2. The objective evaluation of different methods on Data 1.

	$Q_{A B / F}$	$Q_{C B}$	$Q_{E}$	$Q_{F M I}$	$Q_{G}$	$Q_{M I}$	$Q_{N C I E}$	$Q_{N M I}$	$Q_{P}$	$Q_{Y}$
ICA	0.4017	0.4461	0.5300	0.9139	0.3956	1.8567	0.8038	0.2775	0.2654	0.7064
ADKLT	0.4026	0.5404	0.4651	0.8778	0.3976	1.5936	0.8034	0.2382	0.1851	0.7098
MFSD	0.4247	0.5756	0.5898	0.9017	0.4203	1.3551	0.8031	0.1983	0.2056	0.7252
MDLatLRR	0.3248	0.4957	0.4136	0.8874	0.3184	1.0944	0.8028	0.1556	0.2958	0.6882
PMGI	0.3880	0.5035	0.4399	0.9024	0.3803	1.8901	0.8041	0.2747	0.2028	0.7361
RFNNest	0.3372	0.4939	0.3991	0.9031	0.3300	1.7239	0.8036	0.2546	0.2155	0.6856
EgeFusion	0.1968	0.4298	0.3371	0.8688	0.1901	1.1886	0.8029	0.1665	0.2154	0.4970
LEDIF	0.5058	0.5702	0.6512	0.9087	0.5001	1.2948	0.8030	0.1929	0.2572	0.8143
Proposed	0.5860	0.6029	0.7047	0.9248	0.5838	2.7156	0.8067	0.3908	0.3280	0.8802

Table 3. The objective evaluation of different methods on Data 2.

	$Q_{A B / F}$	$Q_{C B}$	$Q_{E}$	$Q_{F M I}$	$Q_{G}$	$Q_{M I}$	$Q_{N C I E}$	$Q_{N M I}$	$Q_{P}$	$Q_{Y}$
ICA	0.4002	0.4417	0.4899	0.9569	0.3987	2.3254	0.8051	0.3427	0.2676	0.7434
ADKLT	0.4043	0.5699	0.4124	0.9249	0.3993	1.8767	0.8041	0.2756	0.1595	0.7093
MFSD	0.4175	0.6009	0.6229	0.9539	0.4128	1.7852	0.8039	0.2594	0.1677	0.6909
MDLatLRR	0.3382	0.4503	0.5120	0.9142	0.3370	1.2513	0.8030	0.1769	0.2772	0.7223
PMGI	0.4605	0.5269	0.5454	0.9516	0.4610	2.1395	0.8043	0.3089	0.1939	0.7885
RFNNest	0.4098	0.5803	0.4507	0.9460	0.4066	2.1851	0.8048	0.3098	0.1841	0.7168
EgeFusion	0.2011	0.3987	0.3715	0.8835	0.1971	1.1956	0.8029	0.1666	0.2133	0.5511
LEDIF	0.5870	0.5920	0.6801	0.9538	0.5845	1.5422	0.8034	0.2297	0.2578	0.8901
Proposed	0.6880	0.6771	0.7431	0.9623	0.6860	3.6399	0.8112	0.5043	0.2976	0.9458

Table 4. The objective evaluation of different methods on Data 3.

	$Q_{A B / F}$	$Q_{C B}$	$Q_{E}$	$Q_{F M I}$	$Q_{G}$	$Q_{M I}$	$Q_{N C I E}$	$Q_{N M I}$	$Q_{P}$	$Q_{Y}$
ICA	0.6748	0.6689	0.7446	0.8854	0.6642	4.1877	0.8113	0.6531	0.7358	0.8365
ADKLT	0.5891	0.6599	0.6499	0.8739	0.5764	3.7880	0.8098	0.5907	0.6140	0.7521
MFSD	0.6183	0.6423	0.7634	0.8751	0.6071	3.5683	0.8091	0.5492	0.6331	0.7636
MDLatLRR	0.3124	0.4782	0.4074	0.8460	0.3083	2.4512	0.8060	0.4063	0.5687	0.5580
PMGI	0.5529	0.2891	0.5425	0.8676	0.5400	3.2741	0.8082	0.5181	0.5801	0.5961
RFNNest	0.5053	0.6186	0.5145	0.8723	0.4964	3.6997	0.8095	0.5728	0.6163	0.7138
EgeFusion	0.2452	0.4732	0.3511	0.8070	0.2414	2.1513	0.8053	0.3561	0.4598	0.5115
LEDIF	0.6390	0.6455	0.7146	0.8829	0.6314	3.4861	0.8088	0.5387	0.7371	0.8444
Proposed	0.7252	0.6830	0.8105	0.8887	0.7182	4.4156	0.8131	0.6674	0.8141	0.9395

Table 5. The objective evaluation of different methods on Data 4.

	$Q_{A B / F}$	$Q_{C B}$	$Q_{E}$	$Q_{F M I}$	$Q_{G}$	$Q_{M I}$	$Q_{N C I E}$	$Q_{N M I}$	$Q_{P}$	$Q_{Y}$
ICA	0.4523	0.3979	0.5932	0.9004	0.4478	2.1008	0.8045	0.3153	0.4024	0.7236
ADKLT	0.3585	0.4032	0.3922	0.8670	0.3529	1.7737	0.8038	0.2697	0.2615	0.6098
MFSD	0.4416	0.4786	0.6176	0.8861	0.4388	1.4931	0.8033	0.2229	0.3066	0.6666
MDLatLRR	0.3157	0.4746	0.3772	0.8874	0.3131	1.2763	0.8029	0.1830	0.4091	0.6339
PMGI	0.3799	0.3587	0.4497	0.8783	0.3764	1.7162	0.8035	0.2594	0.3257	0.7108
RFNNest	0.2971	0.4159	0.3138	0.8920	0.2961	2.0997	0.8046	0.3137	0.3343	0.6153
EgeFusion	0.2123	0.4800	0.3351	0.8582	0.2101	1.2046	0.8029	0.1720	0.2723	0.4726
LEDIF	0.5120	0.4597	0.6724	0.8911	0.5081	1.5419	0.8033	0.2354	0.3847	0.7865
Proposed	0.5947	0.5076	0.6975	0.9059	0.5915	2.5337	0.8062	0.3571	0.5059	0.8553

Table 6. The average objective evaluation of the different methods on 42 pairs of data from the TNO dataset.

	$Q_{A B / F}$	$Q_{C B}$	$Q_{E}$	$Q_{F M I}$	$Q_{G}$	$Q_{M I}$	$Q_{N C I E}$	$Q_{N M I}$	$Q_{P}$	$Q_{Y}$
ICA	0.4317	0.4496	0.5277	0.9074	0.4197	2.1172	0.8048	0.3167	0.3192	0.7050
ADKLT	0.4078	0.4733	0.4205	0.8789	0.3919	1.7968	0.8041	0.2704	0.2341	0.6745
MFSD	0.4274	0.5103	0.5657	0.8948	0.4124	1.6584	0.8038	0.2459	0.2467	0.6627
MDLatLRR	0.3364	0.4735	0.4251	0.8915	0.3274	1.3278	0.8033	0.1924	0.3453	0.6478
PMGI	0.4258	0.4580	0.5123	0.8961	0.4121	2.3462	0.8055	0.3399	0.2777	0.7095
RFNNest	0.3480	0.4679	0.3692	0.8988	0.3347	2.1126	0.8047	0.3067	0.2306	0.6146
EgeFusion	0.2041	0.4421	0.3164	0.8606	0.1964	1.2972	0.8032	0.1850	0.2504	0.4683
LEDIF	0.5222	0.5062	0.6390	0.8996	0.5085	1.8827	0.8044	0.2810	0.3165	0.7919
Proposed	0.5768	0.5306	0.6699	0.9140	0.5654	2.4378	0.8065	0.3460	0.3716	0.8233

Table 7. The average objective evaluation of different methods on 20 pairs of data from the Lytro dataset.

	$Q_{A B / F}$	$Q_{C B}$	$Q_{E}$	$Q_{F M I}$	$Q_{G}$	$Q_{M I}$	$Q_{N C I E}$	$Q_{N M I}$	$Q_{P}$	$Q_{Y}$
ICA	0.6248	0.6334	0.7991	0.8949	0.6191	6.2557	0.8247	0.8360	0.6340	0.8339
FusionDN	0.6018	0.6008	0.7663	0.8833	0.5952	5.7908	0.8221	0.7684	0.6221	0.8224
PMGI	0.3901	0.5656	0.4736	0.8815	0.3857	5.8641	0.8225	0.8004	0.4620	0.6738
U2Fusion	0.6143	0.5682	0.7835	0.8844	0.6093	5.7765	0.8221	0.7725	0.6657	0.7912
LEGFF	0.6810	0.6751	0.8195	0.8937	0.6754	5.6138	0.8214	0.7473	0.7565	0.8817
ZMFF	0.7087	0.7412	0.8687	0.8925	0.7030	6.6271	0.8271	0.8838	0.7853	0.9313
EgeFusion	0.3576	0.4034	0.5032	0.8472	0.3541	3.2191	0.8120	0.4248	0.5405	0.5991
LEDIF	0.7051	0.6898	0.8390	0.8932	0.7005	5.7546	0.8222	0.7659	0.7665	0.9146
Proposed	0.7503	0.7745	0.8819	0.8997	0.7487	7.4854	0.8332	0.9980	0.8302	0.9700

Table 8. The average objective evaluation of different methods on 30 pairs of data from the MFI-WHU dataset.

	$Q_{A B / F}$	$Q_{C B}$	$Q_{E}$	$Q_{F M I}$	$Q_{G}$	$Q_{M I}$	$Q_{N C I E}$	$Q_{N M I}$	$Q_{P}$	$Q_{Y}$
ICA	0.5940	0.7460	0.7562	0.8674	0.5877	6.0569	0.8242	0.8304	0.6298	0.8594
FusionDN	0.5243	0.4996	0.6556	0.8527	0.5187	5.3504	0.8203	0.7179	0.5856	0.7638
PMGI	0.4237	0.5933	0.5061	0.8558	0.4177	5.4884	0.8210	0.7614	0.4750	0.7031
U2Fusion	0.5502	0.5156	0.6970	0.8565	0.5447	5.1498	0.8194	0.6991	0.6212	0.7830
LEGFF	0.6190	0.6060	0.7067	0.8692	0.6106	4.8291	0.8183	0.6555	0.7075	0.8266
ZMFF	0.6395	0.7102	0.7994	0.8631	0.6322	5.7795	0.8228	0.7914	0.6834	0.8804
EgeFusion	0.2874	0.3277	0.3757	0.8255	0.2841	2.8055	0.8111	0.3761	0.5191	0.5539
LEDIF	0.6599	0.6585	0.7610	0.8673	0.6538	5.1592	0.8199	0.7031	0.6968	0.8971
Proposed	0.7348	0.8204	0.8467	0.8779	0.7312	8.2343	0.8412	1.1244	0.7876	0.9825

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, L.; Shi, Y.; Lv, M.; Jia, Z.; Liu, M.; Zhao, X.; Zhang, X.; Ma, H. Infrared and Visible Image Fusion via Sparse Representation and Guided Filtering in Laplacian Pyramid Domain. Remote Sens. 2024, 16, 3804. https://doi.org/10.3390/rs16203804

AMA Style

Li L, Shi Y, Lv M, Jia Z, Liu M, Zhao X, Zhang X, Ma H. Infrared and Visible Image Fusion via Sparse Representation and Guided Filtering in Laplacian Pyramid Domain. Remote Sensing. 2024; 16(20):3804. https://doi.org/10.3390/rs16203804

Chicago/Turabian Style

Li, Liangliang, Yan Shi, Ming Lv, Zhenhong Jia, Minqin Liu, Xiaobin Zhao, Xueyu Zhang, and Hongbing Ma. 2024. "Infrared and Visible Image Fusion via Sparse Representation and Guided Filtering in Laplacian Pyramid Domain" Remote Sensing 16, no. 20: 3804. https://doi.org/10.3390/rs16203804

APA Style

Li, L., Shi, Y., Lv, M., Jia, Z., Liu, M., Zhao, X., Zhang, X., & Ma, H. (2024). Infrared and Visible Image Fusion via Sparse Representation and Guided Filtering in Laplacian Pyramid Domain. Remote Sensing, 16(20), 3804. https://doi.org/10.3390/rs16203804

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Infrared and Visible Image Fusion via Sparse Representation and Guided Filtering in Laplacian Pyramid Domain

Abstract

1. Introduction

2. Related Works

2.1. Deep Learning on Image Fusion

2.2. Traditional Methods of Image Fusion

3. Laplacian Pyramid Transform

4. Proposed Fusion Method

4.1. Image Decomposition

4.2. Low-Frequency Fusion

4.3. High-Frequency Fusion

4.4. Image Reconstruction

5. Experimental Results and Discussion

5.1. Experimental Setup

5.2. Analysis of LP Decomposition Levels

5.3. Qualitative and Quantitative Analysis

5.4. Experimental Expansion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI