Multi-Focus Image Fusion Based on Hessian Matrix Decomposition and Salient Difference Focus Detection

Multi-focus image fusion integrates images from multiple focus regions of the same scene in focus to produce a fully focused image. However, the accurate retention of the focused pixels to the fusion result remains a major challenge. This study proposes a multi-focus image fusion algorithm based on Hessian matrix decomposition and salient difference focus detection, which can effectively retain the sharp pixels in the focus region of a source image. First, the source image was decomposed using a Hessian matrix to obtain the feature map containing the structural information. A focus difference analysis scheme based on the improved sum of a modified Laplacian was designed to effectively determine the focusing information at the corresponding positions of the structural feature map and source image. In the process of the decision-map optimization, considering the variability of image size, an adaptive multiscale consistency verification algorithm was designed, which helped the final fused image to effectively retain the focusing information of the source image. Experimental results showed that our method performed better than some state-of-the-art methods in both subjective and quantitative evaluation.


Introduction
The limited depth of field of a camera causes it to only focus on one area of a scene, which makes it impossible for it to focus on all objects simultaneously. Multi-focus image fusion (MFIF) [1] aims to integrate multiple images of the same scene captured at different focal length settings into a fused all-in-focus image, which can be considered a collection of optimally focused pixels extracted from a set of source images. Fused images can provide a more comprehensive, objective, and reasonable interpretation of a scene than singlesource images. Currently, MFIF is widely used in various fields, including microscopic imaging, weather phenomenon detection, intelligent surveillance, and so on. MFIF methods generally focus on the pixel level and can be approximately classified into deep-learning-, transform domain-, and spatial domain-based methods.
Recently, deep-learning techniques have become popular in computer vision [2], and several deep-learning-based image fusion methods have emerged [3][4][5][6]. Convolutional neural network (CNN) and adversarial generative network (GAN)-based fusion frameworks are two commercial multi-focus image fusion frameworks. Liu et al. [7] first introduced CNNs into the multi-focus fusion task. This scheme generates a focus decision map by learning a binary classifier to determine whether each pixel is focused and optimizes this map using postprocessing methods, such as consistency verification, to improve the quality of the fusion results.
Amin-Naji et al. [6] proposed an ensemble-learning-based method that directly generates the final decision map by combining the decision maps of different models to reduce the postprocessing steps and improve the computational efficiency of the fusion algorithm. recently [21][22][23][24] to solve this problem and more comprehensively consider the spatial information of the source image.
In the block-based algorithm, the image is first decomposed into equal-size image blocks, and then the focus measurement (FM) is used to determine the focus blocks. However, the approach of choosing an appropriate block size is a major challenge. Larger blocks may contain pixels in both focused and out-of-focus regions, while smaller blocks are not conducive to FM. To address these limitations, a previous study [23] used quadtree decomposition in multi-focus image fusion.
Region-based algorithms [25] generally segment the source image into different regions before fusion. Although they can avoid the block effect to a certain extent, when a region is partially segmented, it will generate erroneous segmentation results and affect the quality of fusion results. Regardless of the type of algorithm, block-or area-based, the goal is to determine the focus characteristics of each pixel in the most appropriate way. Therefore, FM plays a critical role as a tool in determining whether a pixel is in or out of focus. It is highly dependent on the edge details and textural information of the image, which may yield suboptimal results if FM is performed directly on the source image.
In addition, FM's judgment of the pixel focus characteristics can be incorrect due to the diversity and complexity of images, which usually occurs in the boundary between the focused and out-of-focus regions of the image. This also leads to artifacts at the boundaries of the fusion results of several sophisticated algorithms. To accurately determine the focusing characteristics of pixels, this study proposes a multi-focus image fusion method.
First, the Hessian matrix is introduced to decompose the source image and obtain the Hessian-matrix feature map of the source image, which highlights the significant information. Meanwhile, to accurately determine the locations of the focused pixel points and avoid interference from out-of-focus pixels at the focus boundary, a focus detection method based on the salient difference analysis strategy is proposed. This method can effectively detect pixels with significant activity using the significance difference between pixels, so that the focus information of the source image can be effectively integrated into the fused image.
The contributions of this study are as follows:

1.
A simple yet effective multi-focus image fusion method based on the Hessian-matrix with salient-difference focus detection is proposed.

2.
A pixel salient-difference maximization and minimization analysis scheme is proposed to weaken the influence of pixels with similar activity levels at the focus boundary. It can effectively distinguish pixels in the focus and out-of-focus regions and produce high-quality focus decision maps.

3.
An adaptive multiscale-consistency-verification scheme is designed in the postprocessing stage, which can adaptively optimize the initial decision maps of different sizes, solving the limitations caused by fixed parameters.
The remainder of this paper is organized as follows: Section 2 introduces the Hessian matrix. Section 3 details the proposed multi-focus image fusion algorithm based on the Hessian matrix and focus difference analysis. Section 4 presents the experimental results and comparative experiments. Finally, the conclusions of the study are presented in Section 5.

Related Works Hessian Matrix and Image Decomposing
In multi-focus images, the focused areas contain more significant information, such as edges, structures, and textures, than the blurred areas. Generally, significant information is detected in the source image before constructing the focus decision map. Most algorithms based on FM are more sensitive to this edge and detail information. Therefore, the method to effectively detect the detailed information within the focused region is the central problem of multi-focus image fusion research. Xiao et al. [26] proposed an image-decomposition strategy based on a multiscale Hessian matrix to make FM perform better and reduce image blurring and pseudo-edge problems. The feature map obtained after image decomposition by this matrix can clearly express the feature information of the source image and facilitate the implementation of focus detection.
Source images f A and f B are used as the input, for which the Hessian matrix is defined as follows: where L xx (x, y, σ) is the convolution of the Gaussian second-order partial derivative ∂ 2 /∂x 2 g(σ) with image f at the point (x, y), and it is the same with L xy (x, y, σ) and L yy (x, y, σ). As a Hessian matrix can extract features with rotational invariance from images at multiple scales [27], the Hessian matrix can be improved to a multiscale Hessian matrix (MSH) weighted at different scales as follows: where j is the j-th scale, n is the number of scales, and j is the weight at scale j. Inspired by reference [25], in this study, we take the weights 1 = 0.8 for scale σ = 0.4, and 2 = 0.2 for scale σ = 1.2. Based on Equation (2), the feature image FI M(x, y) of source images can be extracted by setting the threshold λ as follows: We set λ = 0.0002 as the threshold for extracting the image features. For more information about the Hessian matrix, read [26].

Overview
In this section, we present a detailed introduction to the multi-focus image fusion method based on a Hessian matrix with significant difference focus detection. Figure 1 depicts the general framework of fusion to better illustrate the general flow of the proposed algorithm. The proposed algorithm consists of four important stages: significant information representation, pixel salient-difference analysis, focused decision map optimization, and fusion result reconstruction. As shown in Figure 1, we decompose the source image in the first stage using a multiscale Hessian matrix to obtain the feature region map; subsequently, we use PSML and ML to process the feature region map to obtain four salient images reflecting different information. In the second stage, we process the four significant maps to obtain the initial focused decision map using the pixel salient-difference analysis scheme, and the detailed procedure is given in Section 3.3. In the third stage, the "bwareaopen" fill filter and the adaptive multiscale consistency verification algorithm are used to optimize the initial decision map and increase its accuracy in determining the focus properties of the pixels. In the final stage, the reconstruction of the fusion results is completed after the final decision map is obtained.  Figure 1. Flowchart of the proposed algorithm.

Significant Information Expression
The goal of multi-focus image fusion is to synthesize the focusing information each source image; therefore, to obtain a fully focused image, the pixels being foc must be accurately determined. Typically, in multi-focus images, the focused area p tend to be more prominent than the out-of-focus area pixels. Therefore, we can first ob the saliency map of the source image, and then determine the degree of pixel focu judging the saliency of pixels. Thus, the saliency decision map corresponding to the so image can be obtained. The sum of the modified Laplacian (SML) is an effective too representing the significant information in images. The mathematical expression o ML is as follows: where denotes the step size, using variable spacing (step) between pixels to calc the ML and thus adapt to changes in texture size, which can usually be set to 1. The is defined as follows: where is the radius of the window, is the threshold set to 0, and is set to 3. Traditional ML only considers the pixels around the central one in both horizo and vertical directions. Kong et al. [28] improved the traditional ML by considering other four pixels on the diagonal that contain critical information. The improved M pression is as follows:

Significant Information Expression
The goal of multi-focus image fusion is to synthesize the focusing information from each source image; therefore, to obtain a fully focused image, the pixels being focused must be accurately determined. Typically, in multi-focus images, the focused area pixels tend to be more prominent than the out-of-focus area pixels. Therefore, we can first obtain the saliency map of the source image, and then determine the degree of pixel focus by judging the saliency of pixels. Thus, the saliency decision map corresponding to the source image can be obtained. The sum of the modified Laplacian (SML) is an effective tool for representing the significant information in images. The mathematical expression of the ML is as follows: where step denotes the step size, using variable spacing (step) between pixels to calculate the ML and thus adapt to changes in texture size, which can usually be set to 1. The SML is defined as follows: where α is the radius of the window, L is the threshold set to 0, and α is set to 3. Traditional ML only considers the pixels around the central one in both horizontal and vertical directions. Kong et al. [28] improved the traditional ML by considering the other four pixels on the diagonal that contain critical information. The improved ML expression is as follows: So the improved SML (ISML) can be expressed as follows: The ISML value of each pixel of the feature region map FIM obtained by Equation (3) is expressed as follows: where FI M A and FI M B are the feature region maps obtained by using the Hessian matrix to decompose the source image f A and f B , respectively. Furthermore, M A and M B are the saliency maps of FI M A and FI M B , respectively.

Pixel Salient Difference Analysis (PSDA)
First, we calculate its modified Laplace value within a small window near the pixel origin (x, y).
To find the most and least salient pixel points in the source image, the maximum and minimum ML maps of the source image are found, respectively, and are mathematically expressed as follows: where S max (x, y) and S min (x, y) are the maximum and minimum ML maps of all source images, respectively. S max can be approximated as a fully focused saliency map, while S min is a fully out-of-focus saliency map as ML can reflect the focusing information of the image, and the salient information of each position of S A (x, y) and S B (x, y) is contained in S max . The difference salient map (DSM) between S max and S min can be calculated as follows: Meanwhile, the difference map (DM) between M A and M B can be calculated by: As observed in the ISML maps in Figure 2, the significant information in the source image can be effectively extracted using ISML. The DM is the difference map between two ISML maps. Only the focused region in the source image was retained in the DM, which can be clearly observed at the boundary between the focused and out-of-focus areas. Although the scheme can effectively detect the focused pixels, it also detects some false pixel information at the boundary of the salient map by DM. Figure 3 illustrates the intermediate process of pixel salient-difference analysis.      S max and S min in the figure represent the salient pixel information of fully focused and fully out-of-focus images, respectively, whereas DSM reflects the maximum difference of salient pixel information within the source image. Therefore, by comparing DM and DSM, we can judge the salience of pixels in the source image, and then reflect the focusing characteristics of those pixels. However, in the DM, M A and M B are the ISML maps of the source image, and the difference between them is not as evident as S max and S min . We propose the following rules to comprehensively consider DSM and DM, and thus obtain the initial decision map (IDM): where µ is a custom threshold.

Step 1-Small Area Removal Filter
Considering the inevitable presence of some wrongly selected pixel areas in the IDM. The IDMs in Figure 4 reveal a few small, isolated areas in the focus region, which consist of a few wrongly selected pixels. To solve this problem, we used the "bwareaopen" filter to eliminate the isolated areas or holes containing the erroneous pixels in the focus area.
where S represents the area of the source image. Equation (17) was used to eliminate the isolated areas in the IMD smaller than S/45 using the "bwareaopen" filter to obtain the middle decision map (MDM). Figure 4 illustrates the optimization process of the decision map. Figure 4 reveals that, compared with the IMD, the MDM can further correct the wrongly selected pixels in the decision map and improve the focus detection accuracy. and in the figure represent the salient pixel information of fully focused and fully out-of-focus images, respectively, whereas DSM reflects the maximum difference of salient pixel information within the source image. Therefore, by comparing DM and DSM, we can judge the salience of pixels in the source image, and then reflect the focusing characteristics of those pixels. However, in the DM, and are the ISML maps of the source image, and the difference between them is not as evident as and . We propose the following rules to comprehensively consider DSM and DM, and thus obtain the initial decision map (IDM): where is a custom threshold.

Step 1. Small Area Removal Filter
Considering the inevitable presence of some wrongly selected pixel areas in the IDM. The IDMs in Figure 4 reveal a few small, isolated areas in the focus region, which consist of a few wrongly selected pixels. To solve this problem, we used the "bwareaopen" filter to eliminate the isolated areas or holes containing the erroneous pixels in the focus area.
where S represents the area of the source image. Equation (17) was used to eliminate the isolated areas in the IMD smaller than S/45 using the "bwareaopen" filter to obtain the middle decision map (MDM). Figure 4 illustrates the optimization process of the decision map. Figure 4 reveals that, compared with the IMD, the MDM can further correct the wrongly selected pixels in the decision map and improve the focus detection accuracy.

Step 2-Adaptive multiscale consistency verification
Meanwhile, considering the consistency of the target, we used the consistency verification technique [29] to optimize the MDM. The traditional consistency verification technique only uses one window to determine whether the pixel is in the focus region.
where δ denotes a square field window centered at (x, y). However, this type of method is used with fixed window size and cannot effectively consider the values of the pixel under a varying size, which can easily cause pixel judgment errors in the boundary area between the focused and out-of-focus regions. Moreover, due to the diversity of images, a fixed window size may have different effects on different decision maps, and even lead to serious damage of the focus region in the decision map, introducing a large area of erroneous pixels. Therefore, the selection of window size is crucial for consistency verification. We proposed an adaptive multiscale consistency verification scheme to solve the problem effectively. We set up two windows to determine whether the pixels were focused, and the mathematical formula is expressed as follows: where δ A and δ B are two square domain windows of different sizes centered at (m, n), Figure 4 shows that the erroneously selected pixels at the boundary of the MDM have been removed, and the focus boundary has become smooth and complete.

Fusion Result Reconstruction
With the final decision map FDM obtained, the fusion result can be derived from the following equation:

Experimental Setup
The results were compared with 11 state-of-the-art fusion methods to verify the validity of the proposed method. Eight objective evaluation metrics were used for the quantitative analysis, and the specific experimental setup is explained below.

Image Datasets
We used two of the most popular publicly available datasets for testing, one of which is the "Lytro" dataset [30] of multi-focus color images (see Figure 5). The other is a classic grayscale multi-focus image dataset.
For QCV, a smaller value indicates better performance, while a larger value means better performance for the rest of the seven metrics.

Parametric Analysis
In the proposed method, two key parameters were to be analyzed: threshold in Equation (16) and threshold in Equation (18) that controls the size of windows and . To find the appropriate parameters, we selected a set of images from Lytro as the test images ( Figure 7). First, we set the threshold to 17 and varied . We observed that the boundary of the focus region became blurred when > 0.55, and a boundary discontinuity appeared at the site of the sportsman's elbow. Meanwhile, when = 0.85, several small black holes appeared in the focus area, which caused the final fusion to contain the wrong pixel information. This scenario did not occur when = 0.4 or 0.45.  For QCV, a smaller value indicates better performance, while a larger value means better performance for the rest of the seven metrics.

Parametric Analysis
In the proposed method, two key parameters were to be analyzed: threshold in Equation (16) and threshold in Equation (18) that controls the size of windows and . To find the appropriate parameters, we selected a set of images from Lytro as the test images ( Figure 7). First, we set the threshold to 17 and varied . We observed that the boundary of the focus region became blurred when > 0.55, and a boundary discontinuity appeared at the site of the sportsman's elbow. Meanwhile, when = 0.85, several small black holes appeared in the focus area, which caused the final fusion to contain the wrong pixel information. This scenario did not occur when = 0.4 or 0.45.

Compared Methods
To verify the effectiveness and advancement of the proposed method, we compared 11 current state-of-the-art methods, as follows:

•
Multi-focus image fusion based on NSCT and residual removal [12] (NSCT-RR Among them, NSCT-RR is a method of combining the spatial and transform domains. MWGF, HMD, and GFDF are spatial domain methods based on focus detection, YMY and CSR are methods based on SR, and ECNN, MFF-SSIM, MFF-GAN, U2Fusion and IFCNN are methods based on deep learning. The 11 methods cover various types of current multi-focus image fusion methods. In a sense, they also espouse the latest developments in the field. Therefore, the performance of the proposed method will be validated by comparison with them. The parameter settings of all 11 methods were identical to those in the respective published literature.

Objective Evaluation Metrics
Objective evaluation has been a challenge in image fusion, and individual metrics cannot effectively reflect the full information of the fusion results. Therefore, to comprehensively and effectively evaluate the fusion results of different algorithms to compare the fusion performance, we used eight popular quantitative evaluation metrics for multi-focus image fusion: More specifically, Q MI is a measure of the mutual information between the fused image and the source image, and Q NCIE is used to measure the nonlinear correlation information entropy between the fusion result and the source image; it can be seen that these two metrics belong to the information theory-based metrics. Q G is used to evaluate the amount of edge information, and Q M is an image fusion metric based on a multi-scale scheme implemented using two-level Haar wavelets to measure the degree of edge preservation at different scale spaces. Q P is a phase congruency-based metric since the moments contain corner and edge information of the image. Furthermore, the metric can be defined using the principal moments of the image phase coherence. AG represents the average gradient. A large value indicates that the fused image contains more gradient information, which means better fusion performance. Q CB is constructed based on local saliency, global saliency, and similarity. It evaluates the fused image from the perspective of visual saliency. Both Q CB and Q CV belong to human perception-inspired fusion metrics. The above eight metrics can evaluate the fusion performance of different methods in a comprehensive way, which makes our experiments more convincing.
For Q CV , a smaller value indicates better performance, while a larger value means better performance for the rest of the seven metrics.

Parametric Analysis
In the proposed method, two key parameters were to be analyzed: threshold µ in Equation (16) and threshold T in Equation (18) that controls the size of windows δ A and δ B . To find the appropriate parameters, we selected a set of images from Lytro as the test images ( Figure 7). First, we set the threshold T to 17 and varied µ. We observed that the boundary of the focus region became blurred when µ > 0.55, and a boundary discontinuity appeared at the site of the sportsman's elbow. Meanwhile, when µ = 0.85, several small black holes appeared in the focus area, which caused the final fusion to contain the wrong pixel information. This scenario did not occur when µ = 0.4 or 0.45. However, the golf club wielded by the sportsman can be observed. We observed that the boundary of the club becomes particularly thick when the value of is too small or significantly larger than the actual focused boundary. This results in residual artifacts at the border of the fused image. In summary, the best performance for image fusion was achieved when = 0.55. Hence, we set the threshold to 0.55. was fixed at 0.55, and different thresholds were separately set to find the best value. Table 1

Subjective Analysis of the Fusion Results
The fusion results of the source image "children" in the Lytro dataset for different methods are displayed in Figure 8. Each fusion image was subtracted from the same source image to derive its own difference map. The cleaner the focused region in the difference map, the better the performance of the method. The figures demonstrate that all the methods obtained good fusion results. The difference map shows that the five methods YMY, CSR,MFF-GAN, U2Fusion and IFCNN still contained several pieces of background information in the out-of-focus region. Additionally, not all pixels of the fused image originated from the focused region on the source image, which resulted in the loss of some information and reduced the clarity of the image (Figure 8e, f, i-k). However, the golf club wielded by the sportsman can be observed. We observed that the boundary of the club becomes particularly thick when the value of µ is too small or significantly larger than the actual focused boundary. This results in residual artifacts at the border of the fused image. In summary, the best performance for image fusion was achieved when µ = 0.55. Hence, we set the threshold µ to 0.55.
µ was fixed at 0.55, and different thresholds T were separately set to find the best T value. Table 1 lists the average of the scores of six evaluation metrics for different parameters T on Lytro. The best scores are emboldened. As observed in the table, three metrics achieved the best score when T = 15 or 17. In terms of all the values, the performance of each metric was intermediate and high when T = 17. Hence, in this study, we set the threshold T to 17. In summary, the values of the two key parameters can be set as (µ = 0.55, T = 17).

Subjective Analysis of the Fusion Results
The fusion results of the source image "children" in the Lytro dataset for different methods are displayed in Figure 8. Each fusion image was subtracted from the same source image to derive its own difference map. The cleaner the focused region in the difference map, the better the performance of the method. The figures demonstrate that all the methods obtained good fusion results. The difference map shows that the five methods YMY, CSR, MFF-GAN, U2Fusion and IFCNN still contained several pieces of background information in the out-of-focus region. Additionally, not all pixels of the fused image originated from the focused region on the source image, which resulted in the loss of some information and reduced the clarity of the image (Figure 8e,f,i-k). Entropy 2022, 24, x FOR PEER REVIEW 13 o (l) Although the other methods did not show this effect, MWGF, HMD, GFDF, ECNN also produced residual artifacts at the focus boundary, proving that none of th could better preserve the boundary between the focused and out-of-focus regions of source image (Figure 8b-d,g). At the connection between the person's hat and ear, NSCT-RR method produced a discontinuous focus boundary, and the MFF-SSIM met produced blur at the boundary of the ear. The resulting fused image was inferior to image obtained by the proposed method in several aspects (Figure 8a,h). In contrast, focused region in the difference map of the proposed method was clean, and the focu boundary was continuous (Figure 8j). This indicates that most of the focused pixels in source image were retained in the fusion result. In summary, the proposed method ciently preserved the visual quality of the fusion results, judged the pixel focus charac istics, and led in terms of the overall performance compared to the 11 state-of-the methods.
The fusion results of the source image "globe" in the Lytro dataset obtained by ferent methods are displayed in Figure 9. Figure 9a,b are two multi-focus source ima and Figure 9c-l highlight the fusion results obtained by different methods. For better c parison of the performances, local areas at the same location of each fused image w enlarged. First, as indicated in red, the magnified area of Figure 9, YMY, CSR, ECN MFF-GAN and U2Fusion produced images with blurred boundaries (Figure 9g-I,k,l) Although the other methods did not show this effect, MWGF, HMD, GFDF, and ECNN also produced residual artifacts at the focus boundary, proving that none of them could better preserve the boundary between the focused and out-of-focus regions of the source image (Figure 8b-d,g). At the connection between the person's hat and ear, the NSCT-RR method produced a discontinuous focus boundary, and the MFF-SSIM method produced blur at the boundary of the ear. The resulting fused image was inferior to the image obtained by the proposed method in several aspects (Figure 8a,h). In contrast, the focused region in the difference map of the proposed method was clean, and the focused boundary was continuous (Figure 8j). This indicates that most of the focused pixels in the source image were retained in the fusion result. In summary, the proposed method efficiently preserved the visual quality of the fusion results, judged the pixel focus characteristics, and led in terms of the overall performance compared to the 11 state-of-the-art methods.
The fusion results of the source image "globe" in the Lytro dataset obtained by different methods are displayed in Figure 9. Figure 9a,b are two multi-focus source images, and Figure 9c-l highlight the fusion results obtained by different methods. For better comparison of the performances, local areas at the same location of each fused image were enlarged. First, as indicated in red, the magnified area of Figure 9, YMY, CSR, ECNN, MFF-GAN and U2Fusion produced images with blurred boundaries (Figure 9g-i,k,l). Although the quality of fused images obtained by GFDF, MFF-SSIM and IFC methods was improved, the clarity was lower (Figure 9f,j,m). A closer examination o edges of the person's hand in the images under NSCT-RR, MWGF, and HMD reve discontinuities (Figure 9c-e). Compared with the above methods, the proposed me produced the most continuous, complete, and clean focus boundary of the fused im ( Figure 9l). Hence, the algorithm can effectively transfer the focusing information to source image and effectively determine the focusing properties of the focusing boun pixels. Figure 10 illustrates the fusion results of the synthetic multi-focus source im "Woman in Hat" from the Grayscale dataset under different methods. Overall, all m ods achieved acceptable fusion results. However, as observed in Figures 10e-k,   Although the quality of fused images obtained by GFDF, MFF-SSIM and IFCNN methods was improved, the clarity was lower (Figure 9f,j,m). A closer examination of the edges of the person's hand in the images under NSCT-RR, MWGF, and HMD revealed discontinuities (Figure 9c-e). Compared with the above methods, the proposed method produced the most continuous, complete, and clean focus boundary of the fused image ( Figure 9l). Hence, the algorithm can effectively transfer the focusing information to the source image and effectively determine the focusing properties of the focusing boundary pixels. Figure 10 illustrates the fusion results of the synthetic multi-focus source image "Woman in Hat" from the Grayscale dataset under different methods. Overall, all methods achieved acceptable fusion results. However, as observed in Figure 10e-k, the fusion results of YMY, CSR, ECNN, MFF-SSIM, MFF-GAN, U2Fusion and IFCNN methods lost some source image focus information and produced blurriness. For example, their difference maps revealed the presence of residual information in the out-of-focus regions. The enlarged areas in Figure 10c-f confirmed that HMD, GFDF, YMY, and CSR cou not accurately segment the focal region, and even produced incorrect segments at t boundaries. The MWGF method produced an evident artifact, which resulted in blurr edges of the fusion results ( Figure 10b). Further, the NSCT-RR method effectively p served the pixel information in the focused region on the source image ( Figure 10a).

2022, 24, x FOR PEER REVIEW 14
However, the performance of the proposed method for the retention of the focus boundary was superior to that of NSCT-RR, as it produced better visual effects at t boundary (Figure 10j). In summary, the proposed method distinguished the focused a out-of-focus regions more accurately than the other methods, and the resulting fusion sults had better subjective visual performance.

Objective Analysis of the Fusion Results
In addition to the subjective visual comparison, we also objectively evaluated the methods on the Lytro and Grayscale datasets. The top four scores are also indicated. indicated in Tables 2 and 3, the proposed method scored the best in terms of four indic QMI, QNICE, QG, and QCB. It also scored the best in terms of energy information, detail inf mation, and retention of human-eye visual effects. As indicated in Table 2, although t proposed algorithm did not have the best scores on the remaining four metrics, the scor of three metrics, QM, QP and AG, were among the top three, and the score of metric Q was above average. In addition, from Table 3, it can be found that the proposed algorith had the top four scores in two metrics, QM and QP, and although it performed poorly the remaining metrics AG and QCV, the performance of the proposed algorithm was s in the leading position among all metrics. The enlarged areas in Figure 10c-f confirmed that HMD, GFDF, YMY, and CSR could not accurately segment the focal region, and even produced incorrect segments at the boundaries. The MWGF method produced an evident artifact, which resulted in blurred edges of the fusion results ( Figure 10b). Further, the NSCT-RR method effectively preserved the pixel information in the focused region on the source image ( Figure 10a).
However, the performance of the proposed method for the retention of the focused boundary was superior to that of NSCT-RR, as it produced better visual effects at the boundary (Figure 10j). In summary, the proposed method distinguished the focused and out-of-focus regions more accurately than the other methods, and the resulting fusion results had better subjective visual performance.

Objective Analysis of the Fusion Results
In addition to the subjective visual comparison, we also objectively evaluated the 12 methods on the Lytro and Grayscale datasets. The top four scores are also indicated. As indicated in Tables 2 and 3, the proposed method scored the best in terms of four indices, Q MI , Q NICE , Q G , and Q CB . It also scored the best in terms of energy information, detail information, and retention of human-eye visual effects. As indicated in Table 2, although the proposed algorithm did not have the best scores on the remaining four metrics, the scores of three metrics, Q M , Q P and AG, were among the top three, and the score of metric Q CV was above average. In addition, from Table 3, it can be found that the proposed algorithm had the top four scores in two metrics, QM and QP, and although it performed poorly in the remaining metrics AG and Q CV , the performance of the proposed algorithm was still in the leading position among all metrics.  Notably, although the GFDF method scored the best in terms of Q P and Q CV , it exhibited mediocre performance in the other metrics. In addition, YMY, CSR, MFF-GAN, and U2Fusion performed relatively poorly. The quantitative evaluations of NSCT-RR, HMD, GFDF, and IFCNN algorithms placed them in the intermediate-to high-performance categories, and the scores of several indicators were among the top four scores. To summarize, the proposed algorithm outperformed the 11 methods in the quantitative evaluation. This conclusion is consistent with the subjective visual analysis in Section 4.2, which demonstrated the advantages of the proposed method in both subjective and objective evaluations compared with the state-of-the-art methods.

Robustness Test to Defocus Spread Effect (DSE)
DSE is very important for MFIF, and many current state-of-the-art multi-focus image fusion algorithms ignore the existence of DSE in multi-focus images. Some objects that are not in focus are significantly enlarged in images that suffer from DSE, and they can cause the focus decision map to have pixel focus attributes misjudged to the extent that incorrect pixel information is introduced into the fusion results. We introduced the MFFW [41] dataset to verify the robustness of the proposed algorithm to DSE. The scenes inside this dataset are much more complex compared to the previous two datasets, and there is obvious DSE. It is a major challenge to achieve good fusion performance in MFFW dataset ( Figure 11). dataset are much more complex compared to the previous two datasets, and there is obvious DSE. It is a major challenge to achieve good fusion performance in MFFW dataset ( Figure 11). We performed a quantitative comparison between the proposed algorithm and the comparison method using test data from the MFFW dataset to demonstrate the robustness of the proposed algorithm to DSE. In addition, we changed the parameter in Equation (16) to 0.4 and the definition of the small area in Equation (17) to S/30. The parameters can be better adapted to the MFFW dataset by changing them to obtain good fusion results even in datasets that suffer from DSE.
For the quantitative comparison, we used eight evaluation metrics to score the different fusion results, and Table 4 lists the average of the scores of the various methods in the MFFW dataset.  Table 4 illustrates that the proposed algorithm achieved the best scores for four evaluation metrics and also achieved, QM and QP. It is worth noting that NSCT-RR, MWGF, HMD, and GFDF also performed very well, achieving the top four scores in several metrics. Combining all the metrics, we can conclude that the proposed algorithm is robust to DSE We performed a quantitative comparison between the proposed algorithm and the comparison method using test data from the MFFW dataset to demonstrate the robustness of the proposed algorithm to DSE. In addition, we changed the parameter µ in Equation (16) to 0.4 and the definition of the small area in Equation (17) to S/30. The parameters can be better adapted to the MFFW dataset by changing them to obtain good fusion results even in datasets that suffer from DSE.
For the quantitative comparison, we used eight evaluation metrics to score the different fusion results, and Table 4 lists the average of the scores of the various methods in the MFFW dataset.  Table 4 illustrates that the proposed algorithm achieved the best scores for four evaluation metrics and also achieved, Q M and Q P . It is worth noting that NSCT-RR, MWGF, HMD, and GFDF also performed very well, achieving the top four scores in several metrics. Combining all the metrics, we can conclude that the proposed algorithm is robust to DSE and can still effectively retain the details and gradient information on the source images in the dataset suffering from DSE; further, the fusion performance is better than that of some state-of-the-art algorithms.

Conclusions
In this study, a multi-focus image fusion algorithm based on Hessian matrix decomposition and salient-difference focus detection is proposed. The method uses the multiscale Hessian matrix to extract the feature regions of the image to more comprehensively derive the salient information of the source image for FM. To accurately determine the focus characteristics of each pixel, a focal difference analysis scheme was proposed based on SML, which effectively improved the accuracy of judgment of the focusing characteristics of the pixels. Furthermore, considering that images of different sizes have different degrees of adaptability to the algorithm, an adaptive multiscale consistency verification algorithm that leverages the correlation between each pixel and its surrounding pixels was proposed to optimize the decision map. The method was compared with 11 state-of-the-art methods in an experiment, and all methods were tested on three multi-focus public datasets using eight popular metrics for quantitative analysis. The results showed that the proposed algorithm efficiently transferred the focusing information of the source images to the fusion results and outperformed some state-of-the-art algorithms in both subjective vision and objective evaluations. Further research should focus on thoroughly uncovering the impact of DSE on multi-focus image fusion and finding more efficient ways to solve the DSE problem.