Multi-Focus Image Fusion via Distance-Weighted Regional Energy and Structure Tensor in NSCT Domain

In this paper, a multi-focus image fusion algorithm via the distance-weighted regional energy and structure tensor in non-subsampled contourlet transform domain is introduced. The distance-weighted regional energy-based fusion rule was used to deal with low-frequency components, and the structure tensor-based fusion rule was used to process high-frequency components; fused sub-bands were integrated with the inverse non-subsampled contourlet transform, and a fused multi-focus image was generated. We conducted a series of simulations and experiments on the multi-focus image public dataset Lytro; the experimental results of 20 sets of data show that our algorithm has significant advantages compared to advanced algorithms and that it can produce clearer and more informative multi-focus fusion images.


Introduction
In order to obtain richer and more useful information and to improve the completeness and accuracy of scene description, multi-focus images obtained from video sensors are usually fused [1,2]. Multi-focus image fusion uses effective information processing methods to fuse clear and focused information from different images, resulting in a high-definition fully focused image [3][4][5]. Figure 1 depicts an example of multi-focus image fusion.

Introduction
In order to obtain richer and more useful information and to improve the completeness and accuracy of scene description, multi-focus images obtained from video sensors are usually fused [1,2]. Multi-focus image fusion uses effective information processing methods to fuse clear and focused information from different images, resulting in a highdefinition fully focused image [3][4][5]. Figure 1 depicts an example of multi-focus image fusion.  After many years of extensive and in-depth research, the image fusion method has made significant progress. Image fusion approaches can be divided into three categories: transform domain-based, edge-preserving filtering-based, and deep learning-based algorithms [6][7][8][9]. The most widely used transform domain-based methods are the curvelet transform [10], contourlet transform [11,12], shearlet transform [13,14], etc. Kumar et al. [15] constructed an intelligent multi-modal image fusion technique utilizing the fast discrete curvelet transform and type-2 fuzzy entropy, and the fusion results demonstrate the efficiency of this model in terms of subjective and objective assessment. Kumar et al. [16] constructed an image fusion technique via an improved multi-objective meta-heuristic algorithm with fuzzy entropy in fast discrete curvelet transform domain; a comparison of the developed methodology over the state-of-the-art models observes enhanced performance with respect to visual quality assessment. Li et al. [17] integrated the curvelet and discrete wavelet transform for multi-focus image fusion, and it can obtain advanced fusion performance. Zhang et al. [18] constructed an image fusion model using the contourlet transform. The average-based fusion rule and region energy-based fusion rule are performed on the low-and high-frequency sub-bands, respectively; the fused image is analyzed using fast non-local clustering for multi-temporal synthetic aperture radar image change detection; and this technique generates state-of-the-art change detection performance on both small-and large-scale datasets. Li et al. [19] constructed an image fusion approach using sparse representation and local energy in shearlet transform domain; this method can obtain good fusion performance. Hao et al. [20] introduced a multi-scale decomposition optimization-based image fusion method via gradient-weighted local energy; the non-subsampled shearlet transform is utilized to decompose the source images into low-and high-frequency sub-images, and then the low-frequency components are divided into base layers and texture layers, the base layers are merged according to the intrinsic attribute-based energy fusion rule, and the structure tensor-based gradient-weighted local energy operator fusion rule is utilized to merge the texture layers and high-frequency sub-bands. This fusion method can achieve a superior fusion capability in both qualitative and quantitative assessments.
The methods based on edge-preserving filtering also hold an important position in the field of image fusion, such as guided image filtering [21], cross bilateral filtering [22], Gaussian curvature filtering [23], sub-window variance filtering [24], etc. Feng et al. [24] constructed the multi-channel dynamic threshold neural P systems-based image fusion method via a visual saliency map in the sub-window variance filter domain; this algorithm can obtain effective fusion performance both on visual quality and quantitative evaluations. Zhang et al. [25] introduced a double joint edge preservation filter-based image fusion technique via a non-globally saliency gradient operator; it can obtain excellent subjective and objective performance. Jiang et al. [26] introduced the image fusion approach utilizing entropy measure between intuitionistic fuzzy set joint Gaussian curvature filtering.
The image fusion approaches based on deep learning have generated good performance, and these deep learning-based fusion models can be divided into supervised-and unsupervised-based approaches [27,28]. In terms of supervised algorithms, some effective fusion methods have been generated. The deep convolutional neural network introduced by Liu et al. [29] is used for multi-focus image fusion, and it generates good fusion results. The multi-scale visual attention deep convolutional neural network introduced by Lai et al. [30] is utilized to fuse multi-focus images. Wang et al. [31] introduced weakly supervised image fusion with modal synthesis and enhancement. The self-supervised residual feature learning model constructed by Wang et al. [32] is used to fuse multi-focus images. The attention mechanism-based image fusion approach via supervised learning was constructed by Jiang et al. [33]. In terms of unsupervised algorithms, Jin et al. [34] introduced the Transformer and U-Net for image fusion. Zhang et al. [35] presented another unsupervised generative adversarial network with adaptive and gradient joint constraints for image fusion. Liu et al. [36] constructed a generative adversarial network-based unsupervised back project dense network for image fusion. These deep learning-based fusion methods also achieve good performance.
In order to improve the fusion performance of multi-focus images, a distance-weighted regional energy-based multi-focus image fusion algorithm via the structure tensor in non-subsampled contourlet transform domain was constructed. The distance-weighted regional energy-based fusion rule and the structure tensor-based fusion rule were used to merge low-and high-frequency sub-images, respectively. The experimental results of the Lytro dataset show that the proposed fusion method can produce fusion effects that are superior to other traditional and deep learning methods in terms of subjective visual and objective evaluation.

NSCT
The non-subsampled contourlet transform (NSCT) is a transformation with multidirectional, multi-scale, and translation invariance [12]. Its basic framework structure is divided into two parts: the non-subsampled pyramid (NSP) decomposition mechanism and the non-subsampled directional filter bank (NSDFB) decomposition mechanism. The NSCT first utilizes NSP decomposition to perform multi-scale decomposition on the source image and then uses NSDFB decomposition to further decompose the high-frequency components in the direction, ultimately obtaining sub-band images of the source image at different scales and in different directions [12]. Figure 2 shows the NSCT basic framework structure diagram and the NSCT frequency domain partition diagram.
vised back project dense network for image fusion. These deep learning-based fusion methods also achieve good performance.
In order to improve the fusion performance of multi-focus images, a distance weighted regional energy-based multi-focus image fusion algorithm via the structure ten sor in non-subsampled contourlet transform domain was constructed. The distance weighted regional energy-based fusion rule and the structure tensor-based fusion rule were used to merge low-and high-frequency sub-images, respectively. The experimenta results of the Lytro dataset show that the proposed fusion method can produce fusion effects that are superior to other traditional and deep learning methods in terms of subjective visual and objective evaluation.

NSCT
The non-subsampled contourlet transform (NSCT) is a transformation with multi directional, multi-scale, and translation invariance [12]. Its basic framework structure is divided into two parts: the non-subsampled pyramid (NSP) decomposition mechanism and the non-subsampled directional filter bank (NSDFB) decomposition mechanism. The NSCT first utilizes NSP decomposition to perform multi-scale decomposition on the source image and then uses NSDFB decomposition to further decompose the high-fre quency components in the direction, ultimately obtaining sub-band images of the source image at different scales and in different directions [12]. Figure 2 shows the NSCT basic framework structure diagram and the NSCT frequency domain partition diagram.

Structure Tensor
For a multi-channel image,

Structure Tensor
For a multi-channel image, f (x, y) = ( f 1 (x, y), f 2 (x, y), . . . , f 3 (x, y)), the grayscale images and color images are two special cases when m = 1 and m = 3, respectively. The square of variation of f (x, y) at position (x, y) in direction θ for any ε → 0 + can be given by the following [37]: The rate of change C(θ) of image f (x, y) at position (x, y) can be expressed as follows: , the following second-moment positive semi-definite matrix is where P is called the structure tensor and where E, F, and G are defined as The structure tensor-based focus detection operator (STO) is given by where λ 1 and λ 2 are the eigenvalues of the structure tensor and can be computed by the following:

The Proposed Multi-Focus Image Fusion Method
The multi-focus image fusion method based on the distance-weighted regional energy and structure tensor in NSCT domain is proposed in this section, and the schematic of the proposed algorithm is depicted in Figure 3. This fusion method can be divided into four parts: NSCT decomposition, low-frequency components fusion, high-frequency components fusion, and the inverse NSCT. More details can be seen in the follows.

NSCT Decomposition
The multi-focus images A and B were decomposed into low-frequency components and high-frequency components through the NSCT, and the corresponding coefficients are defined as L A , H l,k A and L B , H l,k B , respectively. four parts: NSCT decomposition, low-frequency components fusion, high-frequency components fusion, and the inverse NSCT. More details can be seen in the follows.

NSCT Decomposition
The multi-focus images A and B were decomposed into low-frequency components and high-frequency components through the NSCT, and the corresponding coefficients

Low-Frequency Components Fusion
Low-frequency components contain more energy and information in the images. In this section, the distance-weighted regional energy (DWRE)-based rule was used to merge the low-pass sub-bands, and it is defined as follows [38]: Here, each entry of W is achieved by reciprocating '1 + distance of the respective position from its center, i.e.,

, W
', which is depicted in the following:

Low-Frequency Components Fusion
Low-frequency components contain more energy and information in the images. In this section, the distance-weighted regional energy (DWRE)-based rule was used to merge the low-pass sub-bands, and it is defined as follows [38]: where W shows a 3 × 3 matrix that allocates weights to the neighboring coefficients and where W is defined as Here, each entry of W is achieved by reciprocating '1 + distance of the respective position from its center, i.e., W(2, 2)', which is depicted in the following: The fused low-frequency component L F (i, j) is constructed by the following: where DWRE L X (i, j) X ∈ {A, B} is the DWRE of L X (i, j) estimated over a 3 × 3 neighborhood centered at the (i, j)th coefficient.

High-Frequency Components Fusion
High-frequency components contain more details and noise in the images. In this section, the structure tensor-based focus detection operator (STO) fusion rule was used to process the high-frequency components, and the fused coefficients H l,k F (i, j) can be computed by where where H l,k X (i, j) shows the high-frequency coefficient of image X ∈ {A, B} at the l-th scale and k-th direction at location (i, j) and where S l,k X (i, j) shows the structure salient image of (14) is the classical operator called the consistency verification, which can improve the robustness and rectify some wrong focusing detection of M l,k X (i, j).

Inverse NSCT
The final fused image F was generated by the inverse NSCT, which was performed on fused low-and high-frequency components L F , H l,k F , and F is defined as follows: The main steps of the proposed multi-focus image fusion approach are summarized in Algorithm 1. For each source image X ∈ {A, B} Calculate the structure salient image S l,k X (i, j) of H l,k X (i, j) computed by STO using Equation (7); Calculate the consistency verification using Equations (14) and (15)

Experimental Results and Discussions
In this section, the Lytro dataset with twenty pair images ( Figure 4) are utilized to experiment, and eight state-of-the-art fusion algorithms are used to be compared with our method, namely, multi-focus image fusion using a bilateral-gradient-based sharpness criterion (BGSC) [39], the non-subsampled contourlet transform and fuzzy-adaptive reduced pulse-coupled neural network (NSCT) [40], multi-focus image fusion in gradient domain (GD) [41], the unified densely connected network for image fusion (FusionDN) [42], the fast unified image fusion network based on the proportional maintenance of gradient and intensity (PMGI) [43], image fusion based on target-enhanced multi-scale transform decomposition (TEMST) [44], the unified unsupervised image fusion network (U2Fusion) [45], and zero-shot multi-focus image fusion (ZMFF) [3]. Subjective and objective assessments are also used; in terms of objective assessment, eight metrics, including edge-based similarity measurement Q AB/F [46], the feature mutual information metric Q FMI [47], the gradient-based metric Q G [48], the structural similarity-based metric Q E [48], the phase congruency-based metric Q P [48], the structural similarity-based metric Q Y introduced by Yang et al. [48], the average gradient metric Q AG [22], and the average pixel intensity metric Q API [22], are used. The larger the values of these indicators, the better the fusion effect. In the proposed method, the size of Ω 1 is set as 5 × 5; the 'vk' and 'pyrexc' are used as the pyramid filter and directional filter, respectively; and the four decomposition levels with 2, 2, 2, and 2 directions from a coarser scale to a finer scale are used.

Qualitative Comparisons
In this section, we selected five sets of data from the Lytro dataset for result display; the qualitative comparisons of different methods are shown in Figures 5-9. We have enlarged some areas for easy observation and comparison. In Figure 5, we can see that the fusion result generated by the BGSC method has a weak far-focus fusion effect and that black spots appear around the building; the fusion results computed by the NSCT, GD, FusionDN, PMGI, and TEMST methods are somewhat blurry; the U2Fusion method generates a dark fusion image which limits the observation of some areas in the image; and

Qualitative Comparisons
In this section, we selected five sets of data from the Lytro dataset for result display; the qualitative comparisons of different methods are shown in Figures 5-9. We have enlarged some areas for easy observation and comparison. In Figure 5, we can see that the fusion result generated by the BGSC method has a weak far-focus fusion effect and that black spots appear around the building; the fusion results computed by the NSCT, GD, FusionDN, PMGI, and TEMST methods are somewhat blurry; the U2Fusion method generates a dark fusion image which limits the observation of some areas in the image; and the ZMFF method produces fully focused image results. However, compared to our method, our result produces moderate brightness and clear edges, especially the brightness of the red lock in the image, which is better than ZMFF.

Qualitative Comparisons
In this section, we selected five sets of data from the Lytro dataset for result display; the qualitative comparisons of different methods are shown in Figures 5-9. We have enlarged some areas for easy observation and comparison. In Figure 5, we can see that the fusion result generated by the BGSC method has a weak far-focus fusion effect and that black spots appear around the building; the fusion results computed by the NSCT, GD, FusionDN, PMGI, and TEMST methods are somewhat blurry; the U2Fusion method generates a dark fusion image which limits the observation of some areas in the image; and the ZMFF method produces fully focused image results. However, compared to our method, our result produces moderate brightness and clear edges, especially the brightness of the red lock in the image, which is better than ZMFF.  In Figure 6, we can see that the fusion results generated by the BGSC and PMGI methods are blurry, making it difficult to observe the detailed information in the fused images; the fusion images computed by NSCT, FusionDN, TEMST, and ZMFF are similar, these images all yielding fully focused image results; the GD method generates a highbrightness fusion image, with the color information in the image being partially lost; the U2Fusion method generates a sharpened fused image, with some areas of the image having lost information, such as the branches of the tree on the right side and the branches of the small tree that is far away in the image, all of which are darker in the image, which is not conducive to obtaining information in a fully focused image; and the fusion image calculated by the proposed method is an all-focused image, with the details and edge in- U2Fusion method generates a sharpened fused image, with some areas of the image having lost information, such as the branches of the tree on the right side and the branches of the small tree that is far away in the image, all of which are darker in the image, which is not conducive to obtaining information in a fully focused image; and the fusion image calculated by the proposed method is an all-focused image, with the details and edge information of the image being well preserved and with moderate brightness and ease of observation. In Figure 7, we can denote that the fusion images calculated by the BGSC, GD, Fu-sionDN, PMGI, and TEMST methods appear blurry, the images generated by BGSC and TEMST especially exhibiting severe distortion; the image computed by the NSCT method shows a basic full-focus result with some artifacts in the image, making it appear to have texture overlays; the U2Fusion method generates a dark image; and the fusion image calculated by the ZMFF method is very close to the result obtained by our algorithm. However, our fusion effect is better, with a clearer image and the almost seamless integration of all the information into the fully focused image. In Figure 6, we can see that the fusion results generated by the BGSC and PMGI methods are blurry, making it difficult to observe the detailed information in the fused images; the fusion images computed by NSCT, FusionDN, TEMST, and ZMFF are similar, these images all yielding fully focused image results; the GD method generates a highbrightness fusion image, with the color information in the image being partially lost; the U2Fusion method generates a sharpened fused image, with some areas of the image having lost information, such as the branches of the tree on the right side and the branches of the small tree that is far away in the image, all of which are darker in the image, which is not conducive to obtaining information in a fully focused image; and the fusion image calculated by the proposed method is an all-focused image, with the details and edge information of the image being well preserved and with moderate brightness and ease of observation. sionDN, PMGI, and TEMST methods appear blurry, the images generated by BGSC and TEMST especially exhibiting severe distortion; the image computed by the NSCT method shows a basic full-focus result with some artifacts in the image, making it appear to have texture overlays; the U2Fusion method generates a dark image; and the fusion image calculated by the ZMFF method is very close to the result obtained by our algorithm. However, our fusion effect is better, with a clearer image and the almost seamless integration of all the information into the fully focused image.  In Figure 8, we can see that the fused images computed by the BGSC, FusionDN, and PMGI methods appear blurry, especially the forest information in the far distance of the images, making it difficult to observe the details; the images generated by the NSCT, TEMST, and ZMFF methods are similar; the fused image obtained by the GD method has high brightness and clarity; the fused image computed by the U2Fusion method has some dark regions, especially in the mouth area of the horse, with the loss of information being severe; and the fused image achieved by the proposed method has moderate brightness and retains more image information. In Figure 7, we can denote that the fusion images calculated by the BGSC, GD, Fu-sionDN, PMGI, and TEMST methods appear blurry, the images generated by BGSC and TEMST especially exhibiting severe distortion; the image computed by the NSCT method shows a basic full-focus result with some artifacts in the image, making it appear to have texture overlays; the U2Fusion method generates a dark image; and the fusion image calculated by the ZMFF method is very close to the result obtained by our algorithm. However, our fusion effect is better, with a clearer image and the almost seamless integration of all the information into the fully focused image.
PMGI methods appear blurry, especially the forest information in the far distance of the images, making it difficult to observe the details; the images generated by the NSCT, TEMST, and ZMFF methods are similar; the fused image obtained by the GD method has high brightness and clarity; the fused image computed by the U2Fusion method has some dark regions, especially in the mouth area of the horse, with the loss of information being severe; and the fused image achieved by the proposed method has moderate brightness and retains more image information.  In Figure 9, we can see that the fusion images obtained by the BGSC, FusionDN, and PMGI approaches exhibit a certain degree of the blurring phenomenon, which makes it difficult to observe the information in the images, the image generated by the BGSC method especially not achieving full focus and the information on the left side of the image being severely lost; the NSCT, TEMST, and ZMFF methods all obtain a basic fully focused image from which information about the entire scene can be observed; the GD method gives a high-brightness fusion image but can also cause some details in the image to be lost; the fused image achieved by the U2Fusion method has many darker areas, causing severe information loss (for example, the information about the girl's clothing and headscarf in the image cannot be captured correctly); the fusion image obtained by our algorithm fully preserves the information of the two source images. Additionally, it not only has moderate brightness, but the detailed information of the entire scene can also be fully observed and obtained. In Figure 8, we can see that the fused images computed by the BGSC, FusionDN, and PMGI methods appear blurry, especially the forest information in the far distance of the images, making it difficult to observe the details; the images generated by the NSCT, TEMST, and ZMFF methods are similar; the fused image obtained by the GD method has high brightness and clarity; the fused image computed by the U2Fusion method has some dark regions, especially in the mouth area of the horse, with the loss of information being severe; and the fused image achieved by the proposed method has moderate brightness and retains more image information.
method gives a high-brightness fusion image but can also cause some details in the image to be lost; the fused image achieved by the U2Fusion method has many darker areas, causing severe information loss (for example, the information about the girl's clothing and headscarf in the image cannot be captured correctly); the fusion image obtained by our algorithm fully preserves the information of the two source images. Additionally, it not only has moderate brightness, but the detailed information of the entire scene can also be fully observed and obtained.

Quantitative Comparisons
In this section, the quantitative comparisons of different methods are shown in Tables 1-6 and Figure 10. In Table 1, we can see that the metrics data Q , Y Q , and AG Q computed by the proposed method are the best; the metric data API Q generated by the GD method is the best, but the corresponding indicator result of our algorithm still ranks third. In Table 2, we can see that the metrics data Q , P Q , and Y Q generated by our method are the best. The metric data AG Q generated by the U2Fusion method is the best, while our method ranks second; the metric data API Q generated by the GD method is the best, and our method ranks fourth. In Table 3, we denoted that the metrics data computed by our method are the best; the metric data API Q generated by the FusionDN method is the best, and our method ranks fourth. In Table 4, we can see that the metrics In Figure 9, we can see that the fusion images obtained by the BGSC, FusionDN, and PMGI approaches exhibit a certain degree of the blurring phenomenon, which makes it difficult to observe the information in the images, the image generated by the BGSC method especially not achieving full focus and the information on the left side of the image being severely lost; the NSCT, TEMST, and ZMFF methods all obtain a basic fully focused image from which information about the entire scene can be observed; the GD method gives a high-brightness fusion image but can also cause some details in the image to be lost; the fused image achieved by the U2Fusion method has many darker areas, causing severe information loss (for example, the information about the girl's clothing and headscarf in the image cannot be captured correctly); the fusion image obtained by our algorithm fully preserves the information of the two source images. Additionally, it not only has moderate brightness, but the detailed information of the entire scene can also be fully observed and obtained.

Quantitative Comparisons
In this section, the quantitative comparisons of different methods are shown in Tables 1-6 and Figure 10. In Table 1, we can see that the metrics data Q AB/F , Q FMI , Q G , Q E , Q P , Q Y , and Q AG computed by the proposed method are the best; the metric data Q API generated by the GD method is the best, but the corresponding indicator result of our algorithm still ranks third. In Table 2, we can see that the metrics data Q AB/F , Q FMI , Q G , Q E , Q P , and Q Y generated by our method are the best. The metric data Q AG generated by the U2Fusion method is the best, while our method ranks second; the metric data Q API generated by the GD method is the best, and our method ranks fourth. In Table 3, we denoted that the metrics data Q AB/F , Q FMI , Q G , Q E , Q P , Q Y and Q AG computed by our method are the best; the metric data Q API generated by the FusionDN method is the best, and our method ranks fourth. In Table 4, we can see that the metrics data Q AB/F , Q FMI , Q G , Q E , Q P , and Q Y generated by our method are the best; the metrics data Q AG and Q API generated by the GD method are the best. In Table 5, we denoted that the metrics data Q AB/F , Q FMI , Q G , Q E , Q P , Q Y , and Q AG computed by our method are the best; the metric data Q API computed by the FusionDN method is the best.   Table 6. The average quantitative assessment of different methods in Figure 10.  Figure 10 shows the objective experimental results of different algorithms with respect to 20 sets of image pairs, and the data results of the same method with respect to different image pairs are connected into a curve, with the average indicator value on the right side of the indicator graph. Table 6 shows the average quantitative assessment of the different methods in Figure 10. In Figure 10 and Table 6, we can see that the average metrics data Q AB/F , Q FMI , Q G , Q E , Q P , and Q Y generated by our method are the best; the average metrics date Q AG and Q API generated by the U2Fusion and GD methods are the best, respectively, and the two corresponding average metrics data computed by our method rank second.
Through rigorous qualitative and quantitative evaluation and analysis, the results show that our algorithm stands out in the fusion effect of multi-focus images. Compared with state-of-the-art algorithms, we have achieved the best fusion effect and have the advantages of rich information and clear images.   Figure 10 shows the objective experimental results of different algorithms with respect to 20 sets of image pairs, and the data results of the same method with respect to different image pairs are connected into a curve, with the average indicator value on the right side of the indicator graph. Table 6 shows the average quantitative assessment of the different methods in Figure 10. In Figure 10 and are the best, respectively, and the two corresponding average metrics data computed by our method rank second.
Through rigorous qualitative and quantitative evaluation and analysis, the results show that our algorithm stands out in the fusion effect of multi-focus images. Compared with state-of-the-art algorithms, we have achieved the best fusion effect and have the advantages of rich information and clear images.

Conclusions
In this paper, a novel multi-focus image fusion method based on the distance-weighted regional energy and structure tensor in NSCT domain was proposed. The structure tensorbased fusion rule was utilized to fuse low-frequency sub-bands, and the distance-weighted regional energy-based fusion rule was utilized to fused high-frequency sub-bands. The proposed method was experimented on the Lytro dataset with 20 paired images, and the fusion results demonstrate that our method can generate state-of-the-art fusion performance in terms of image information, definition, and brightness, making the seamless fusion of multi-focus images possible. In future work, we will improve and expand the application of this algorithm in the field of multi-modal image fusion so that it has better universality in image fusion.