Multi-Sensor Medical-Image Fusion Technique Based on Embedding Bilateral Filter in Least Squares and Salient Detection

A multi-sensor medical-image fusion technique, which integrates useful information from different single-modal images of the same tissue and provides a fused image that is more comprehensive and objective than a single-source image, is becoming an increasingly important technique in clinical diagnosis and treatment planning. The salient information in medical images often visually describes the tissue. To effectively embed salient information in the fused image, a multi-sensor medical image fusion method is proposed based on an embedding bilateral filter in least squares and salient detection via a deformed smoothness constraint. First, source images are decomposed into base and detail layers using a bilateral filter in least squares. Then, the detail layers are treated as superpositions of salient regions and background information; a fusion rule for this layer based on the deformed smoothness constraint and guided filtering was designed to successfully conserve the salient structure and detail information of the source images. A base-layer fusion rule based on modified Laplace energy and local energy is proposed to preserve the energy information of these source images. The experimental results demonstrate that the proposed method outperformed nine state-of-the-art methods in both subjective and objective quality assessments on the Harvard Medical School dataset.


Introduction
The technique of image fusion integrates multiple images generated by different sensors with different descriptions of the same scene to produce an image with more compatible and accurate information [1]. The main image fusion technologies include multi-focus image fusion, medical image fusion, infrared and visible image fusion, remote sensing image fusion, etc. This technique has been widely applied in the fields of surveillance, clinical diagnostics, automation, national defense, biometrics, and remote sensing. Medical image fusion, which integrates all the useful complementary information from different medical images into a fused image, as a branch of image fusion, occupies a crucial position in research. The use of a single sensor-formed image as a basis for judgment has limitations while describing the health status of tissues (for example, using computed tomography (CT) that detects only dense structures such as bones and implants; magnetic resonance imaging (MRI) that provides soft tissue information; positron emission tomography (PET), which reflects the biological activity of cells and molecules; and single-photon emission computed tomography (SPECT), which reflects the blood flow through tissues and organs). Fused images can provide a more comprehensive, reliable, and better description of lesions, thereby making a significant contribution to biomedical research and clinical SDB methods rely on detecting pixel-level activities, which reflect features such as the level of image sharpness and structural saliency. The main steps are as follows. First, the activity of a pixel or region is detected by a specific function or algorithm to obtain the activity map of the image. Then, according to a given rule (e.g., the "maximum absolute value (Abs)" rule), it generates an active decision map. Finally, the decision map is used to reconstruct the source image to obtain a fused image. In SDB methods, image processing using edge-preserving filters has become increasingly common. The base layer of the image is obtained using an edge-preserving filter to potentially capture large variations and a set of detail layers to preserve detail at progressively more refined levels. Mo et al. [23] proposed an attribute-filter-based image fusion method wherein the prominent objects in the image were first extracted using attribute-and edge-preserving filters, and then the fusion results were obtained using a weight-based Laplacian-pyramid image-fusion strategy. Overall, SDB methods are simple and fast, but pixel-level activity detection is not an easy task, and incorrect activity detection may lead to the occurrence of blocking (region) artifacts, introduce certain spectral distortions, and degrade the sharpness of the fusion results [24].
Although existing multi-sensor medical-image fusion techniques have achieved great success, certain shortcomings still exist. For example, the atoms of the dictionary in SRB methods have a limited ability to represent salient features in the image [25]. The fusion rules in TDB and SDB methods are often based on pixels or regions without consideration of edges or structures in the image [26]. In addition, most existing methods lack attention to salient information, which is often a visual reflection of tissue health status in medical images. To retain the salient information in the source images, in this paper, we propose a medical image fusion method based on least-squares using the bilateral filter (BLF-LS) and deformation smoothness constraint (DSC) [24], which can effectively retain the salient information, edge, and energy from source images.
The BLF-LS is a recently developed edge-preserving filter. It takes advantage of bilateral filtering and the least-squares (LS) model, effectively smoothing the edges within the texture region while producing results without gradient reversals and halos; it also offers the advantage of fast operation [27]. Therefore, we introduced the BLF-LS to decompose the source image. A fusion rule combining DSC and the rolling guidance filter (RGF) [25] was designed to fuse detail layers. Saliency describes what attracts the visual attention of humans in a bottom-up manner. Salient detection can maintain the integrity of important target regions and enables high-quality image fusion. The main contributions of this study are as follows:

1.
A medical image fusion method based on the BLF-LS and salient detection is proposed.
To the best of our knowledge, this is the first time the BLF-LS has been applied in medical-image fusion. The source images are decomposed into the detail and base layers.

2.
A detail-layer fusion rule based on DSC and RGF is proposed, which fully considers the low contrast between the target and background.

3.
A fusion rule based on modified Laplace energy and local energy (MLEN) was designed to maintain detailed information and energy in the base layer. 4.
The proposed fusion method can be effectively extended to the IR-and VIS-image fusion problem and yield competing fusion performance.
The remainder of this paper is organized as follows. In Section 2, the background of the BLF-LS and salient detection using a DSC is briefly introduced. Section 3 explains the proposed image-fusion algorithm. The experimental results and discussion are presented in Section 4. Finally, Section 5 concludes the paper.

Embedding Bilateral Filter in Least Squares
Edge-preserving filters offer many advantages, such as accurately separating image structures at different scales while maintaining the spatial consistency of these structures, re-ducing the blurring effects around edges, providing a good edge-and boundary-preserving performance, and smoothing background information. The BLF-LS is an edge-preserving filter achieved using global methods [27]. The smoothing result of this filter is free of gradient reversals and halos. Additionally, the BLF-LS runs faster because it utilizes the efficiency of bilateral filter (BLF) and the LS model. To facilitate the understanding of the BLF-LS, we first describe BLF. For a given image g, the output image µ through the BLF is computed as follows: where s and t denote different pixel points, G σ S denotes the Gaussian kernel that determines the spatial support, and G σ r denotes the Gaussian kernel that controls the sensitivity to the edges. The BLF has the advantage of fast image processing. However, because the edges are sharpened in the smoothed image and boosted in the reverse direction in the enhanced image, gradient inversion and halos are produced in the result. Suppose f BLF (∇g * ) denotes the smoothing gradients, and * denotes the axis direction of the input image g with the BLF; embedding f BLF (∇g) into the LS framework achieves efficient edge-preserving smoothing. This approach allows the BLF-LS to achieve both the edge-smoothing quality of the LS and BLF models with a proper processing efficiency, as described below. Given an input image g, the output image µ with BLF-LS is: where s denotes the pixel position. When the value of λ is large enough, the gradient of the image µ, that is, ∇µ * ,s , will resemble f BLF (∇g * )( * ∈ {x, y}), which guarantees the smooth quality of the BLF-LS. Because the LS model can be solved in the Fourier domain, the speed of the BLF-LS is guaranteed. Equation (2) can be solved as follows: where F (·) and F −1 (·) are the fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) operators, respectively; F (·) denotes the complex conjugate of F (·); and F (1) is the FFT of the delta function. Additionally, multiplication and division are both point-wise operations.

Salient Detection via Deformed Smoothness Constraint
The DSC [28] is a propagation model that can capture significant targets when there is low contrast between the object regions and background. It comprises three main steps. First, the image is segmented using superpixels, and the segmentation result is represented as a graph. Then, a coarse map is generated via the background seeds and a deformed smoothness-based manifold ranking model, and the objectness map is built through the object proposal. Finally, the coarse and objectness maps are used to generate a refined map.
The input image I generates a significant detection map g described as follows: where M c is the coarse map; D c and v c are the degree matrix and volume of M c respectively; W c is a weight matrix computed by M c ; µ is a non-negative parameter that balances the weights of the two smoothness constraints; M o denotes the objectness map obtained for each node using an edge box (M o = m o i n ); and The optimal solution of Equation (4) is expressed as: The elements of g were normalized to [0, 1] and assigned to the corresponding superpixels to generate a saliency detection map.

Proposed Method
The proposed method is illustrated in Figure 1. First, the source images are decomposed into a base layer and a detail layer via the BLF-LS. The base layer, which is obtained by decomposing the source image, is fused based on the MLEN fusion rules to retain the energy information of the source image. Moreover, the detail layers are considered a superposition of the salient regions and background information. The detail layers are decomposed into background-detail layers and salient-detail layers using a model based on the DSC and RGF. To fully retain the energy information, the background-detail layer of the fused image is obtained using the fusion rule, Abs. Regarding the salient-detail layers that contain important salient targets, the overlap between the two salient-detail layers is removed using the DSC-RGF model, and a direct summation method is used to obtain the salient-detail layer of the fused image. Finally, the fused image is obtained by reconstruction. where is the coarse map; and are the degree matrix and volume of respectively; is a weight matrix computed by ; is a non-negative parameter that balances the weights of the two smoothness constraints; denotes the objectness map obtained for each node using an edge box ( = ( ) ); and = ( ) × represents a diagonal matrix with = ( (− )). The optimal solution of Equation (4) is expressed as: The elements of g were normalized to [0, 1] and assigned to the corresponding superpixels to generate a saliency detection map.

Proposed Method
The proposed method is illustrated in Figure 1. First, the source images are decomposed into a base layer and a detail layer via the BLF-LS. The base layer, which is obtained by decomposing the source image, is fused based on the MLEN fusion rules to retain the energy information of the source image. Moreover, the detail layers are considered a superposition of the salient regions and background information. The detail layers are decomposed into background-detail layers and salient-detail layers using a model based on the DSC and RGF. To fully retain the energy information, the background-detail layer of the fused image is obtained using the fusion rule, Abs. Regarding the salient-detail layers that contain important salient targets, the overlap between the two salient-detail layers is removed using the DSC-RGF model, and a direct summation method is used to obtain the salient-detail layer of the fused image. Finally, the fused image is obtained by reconstruction. Additionally, for a functional medical image fusion problem, the following conversion scheme is used: red, green, and blue (RGB)→luma, blue projection, and red projection (YUV)→RGB, as shown in Figure 2. Additionally, for a functional medical image fusion problem, the following conversion scheme is used: red, green, and blue (RGB)→luma, blue projection, and red projection (YUV)→RGB, as shown in Figure 2.

Decomposition of Base Layer and Detail Layer
BLF-LS achieves useful edge-preserving smoothing by embedding the BLF into the LS framework. First, we employed a BLF-LS to decompose the source images into a base layer and a detail layer. The details of the base layer are as follows: where 1 and 2 denote the source images, 1 and 2 are the base layers obtained by decomposing 1 and 2 , respectively; and BLF−LS is a BLF-LS used for smoothing the images described in Equation (2). After the base layer is obtained, it can be subtracted from the source image to obtain the detail layer: where 1 and 2 are the detail layers obtained by decomposing 1 and 2 , respectively. The base layer can potentially capture large variations in intensity, and the detail layer can preserve details at fine scales.

Decomposition of Detail Layer Based on DSC-RGF Algorithm
In recent years, many saliency detection methods that can detect salient visual areas or objects and easily draw visual attention have been proposed. It is typically easier to detect salient targets in the detail layer obtained using a smoothing filter. To this end, we designed a method based on the DSC and RGF to fuse the detail layer, as shown in Figure  3; this method mainly consists of the following steps. First, the initial salient decision map is obtained by applying the DSC to the detail layer. Second, the overlapping part of the two initial salient decision maps produces ghosting in the fusion results and affects the visual effect; therefore, the overlap-removal procedure is performed on the initial salient decision map. Then, in view of the edge-smoothing problem of the significant target in the salient decision map, the RGF is used to process the salient decision map to obtain the salient guided filtering (SGF) map. Finally, the detail layer is decomposed into background and salient-detail layers using this map.

Decomposition of Base Layer and Detail Layer
BLF-LS achieves useful edge-preserving smoothing by embedding the BLF into the LS framework. First, we employed a BLF-LS to decompose the source images into a base layer and a detail layer. The details of the base layer are as follows: where I 1 and I 2 denote the source images, B 1 and B 2 are the base layers obtained by decomposing I 1 and I 2 , respectively; and F BLF−LS is a BLF−LS used for smoothing the images described in Equation (2). After the base layer is obtained, it can be subtracted from the source image to obtain the detail layer: where D 1 and D 2 are the detail layers obtained by decomposing I 1 and I 2 , respectively. The base layer can potentially capture large variations in intensity, and the detail layer can preserve details at fine scales.

Decomposition of Detail Layer Based on DSC-RGF Algorithm
In recent years, many saliency detection methods that can detect salient visual areas or objects and easily draw visual attention have been proposed. It is typically easier to detect salient targets in the detail layer obtained using a smoothing filter. To this end, we designed a method based on the DSC and RGF to fuse the detail layer, as shown in Figure 3; this method mainly consists of the following steps. First, the initial salient decision map is obtained by applying the DSC to the detail layer. Second, the overlapping part of the two initial salient decision maps produces ghosting in the fusion results and affects the visual effect; therefore, the overlap-removal procedure is performed on the initial salient decision map. Then, in view of the edge-smoothing problem of the significant target in the salient decision map, the RGF is used to process the salient decision map to obtain the salient guided filtering (SGF) map. Finally, the detail layer is decomposed into background and salient-detail layers using this map.
The details of each step are as follows. First, the DSC model is used to detect the detail layer to obtain the salient information, and threshold correction is adopted to process the salient information of the initial salient decision map.
where * F DSC denotes the salient-detection operation using the DSC model in Equation (4); T is the threshold value; and I 1 ISD (x, y) and I 2 ISD (x, y) are the initial salient decision maps obtained from D 1 (x, y) and D 2 (x, y), respectively. The details of each step are as follows. First, the DSC model is used to detect the detail layer to obtain the salient information, and threshold correction is adopted to process the salient information of the initial salient decision map.
where * DSC denotes the salient-detection operation using the DSC model in Equation (4); T is the threshold value; and 1 ( , ) and 2 ( , ) are the initial salient decision maps obtained from 1 ( , ) and 2 ( , ), respectively. Second, it is necessary to remove the overlapping part , which is generated by multiplying 1 and 2 ( = 1 • 2 ). Directly phasing it into the fusion result causes ghosting, which affects the visual effect.
where 1 and 2 represent the salient decision maps obtained after removing the overlapping parts. Considering the edge-smoothing problem of the significant target in the salient decision map, we used RGF to process to obtain the SGF map.
where (•) represents the RGF function; ′ denotes the number of iterations; r denotes the filter size; denotes the degree of blur; and 1 and 2 denote the SGF maps used to decompose the detail-layer maps of the source image.
Finally, the salient-detail layers are obtained by multiplying the SGF maps by the detail layers, and the background-detail maps are obtained by removing the salient parts of the detail layers, as described below.
where 1 ( , ) and 2 ( , ) denote the salient detail layers obtained from the decomposition of 1 ( , ) and 1 ( , ), respectively; and 1 ( , ) and 2 ( , ) denote the  Second, it is necessary to remove the overlapping part I R , which is generated by multiplying I 1 ISD and I 2 ISD I R = I 1 ISD ·I 2 ISD . Directly phasing it into the fusion result causes ghosting, which affects the visual effect.
where I 1 SD and I 2 SD represent the salient decision maps obtained after removing the overlapping parts. Considering the edge-smoothing problem of the significant target in the salient decision map, we used RGF to process I n SD to obtain the SGF map.
where F RGF (·) represents the RGF function; T denotes the number of iterations; r denotes the filter size; ε denotes the degree of blur; and I 1 SGF and I 2 SGF denote the SGF maps used to decompose the detail-layer maps of the source image.
Finally, the salient-detail layers are obtained by multiplying the SGF maps by the detail layers, and the background-detail maps are obtained by removing the salient parts of the detail layers, as described below.
where SD 1 (x, y) and SD 2 (x, y) denote the salient detail layers obtained from the decomposition of D 1 (x, y) and D 2 (x, y), respectively; and BD 1 (x, y) and BD 2 (x, y) denote the background detail layers obtained from the decomposition of D 1 (x, y) and D 2 (x, y), respectively.

Fusion of Base Layer Based on MLEN
The pair of base layers obtained under the BLF-LS decomposition contain abundant energy information and little information on the detail in the source image. Therefore, we considered using local energy (LEN) to extract the energy information and sum-modified Laplacian (SML) energy to extract the detail-related information from the base layers and finally add these two types of information to obtain the fused base layer. SML is defined as follows [29]: where M × N denotes the window size centered at (x,y), and ML n (x, y) denotes the modified Laplacian (ML) at point (x,y); ML n (x, y) is defined as follows: where B 1 and B 2 are the base layers decomposed from I 1 and I 2 , respectively. LEN is defined as follows: where M × N denotes the window size centered at (x, y), and the fusion of the base layer can be briefly described as follows: where B F (x, y) denotes the base layer of the fused image.

Fusion Result
BD 1 (x, y) and BD 2 (x, y) contain most of the energy information from the original image. To avoid excessive energy loss in the fused image, we used the fusion rule of taking the Abs to obtain the background-detail layer of the fused image BD F (x, y): Because the salient-detail layers contain significant information, they are fused by direct summation, as follows: where SD F (x, y) denotes the salient-detail layers of the fused image. Finally, the fused image is obtained by combining the base, salient-detail, and background-detail layers, as follows: The formal mechanism of the proposed method is described in Algorithm 1.

Algorithm 1 Steps in proposed fusion method
Inputs: Medical CT Image I 1 ; Medical MRI Image I 2 Output: Fused image F Step 1: The BLF-LS is employed to decompose I 1 and I 2 to obtain the corresponding base layers B 1 and B 2 and detail layers D 1 and D 2 (Equations (6) and (7)).
Step 2: The DSC-RGF algorithm is utilized to decompose the detail layers D 1 and D 2 to obtain the corresponding significant-detail layers SD 1 and SD 2 and background-detail layers BD 1 and BD 2 (Equations (8)- (12)).
Step 3: The fusion base layer B F is obtained using the MLEN rule (Equations (13)- (16)). Then, the fusion Abs rule is employed to fuse BD 1 and BD 2 and thereby obtain the fusion base layer BD F (Equation (17)). SD 1 and SD 2 are added to obtain the significant-detail layer SD F of the fused image (Equation (18)).
Step 4: The fused image is obtained by summing GD F , B F , and SD F (Equation (19)).

Test Data
The experimental dataset was selected from 100 sets each comprising a CT-MRI, PET-MRI, and SPECT-MRI image, amounting to 300 source images for testing. The source images were images of the human brain captured by different imaging mechanisms; each image was 256 × 256 pixels, and each pair of images was aligned. The test images were obtained from a database of Harvard Medical School (http://www.med.harvard.edu/ aanlib/home.html, accessed on 8 November 2022).

Quantitative Evaluation Metrics
Subjective quality assessment of image fusion represents human intuition but lacks quantitative description, so objective quality assessment is also needed to evaluate the performance of fusion algorithms. The metrics used for objective quality evaluation of images typically include three categories: information theory-based, image feature-based, and human perception-inspired fusion metrics. In this study, six common metrics were selected to objectively assess the fusion performance: normalized mutual information (Q MI ) [30], image fusion metric based on a multiscale scheme (Q M ) [31], nonlinear correlation information entropy (Q NCIE ) [32], metric based on phase congruency (Q P ) [33], entropy (EN) [34], and visual information fidelity (V IF) [35].
Q MI is a quality index that describes the quantity of information conveyed from the source image to the fused image; Q NCIE is used to display the nonlinear correlation degree of the concerned multivariable dataset; EN measures the amount of information contained in the fused image; Q MI , Q NCIE , and EN are evaluation metrics based on information theory. Q M evaluates the retention value of the edge information in fused images from multiple scales; Q P is defined by the maximum and minimum moments of phase coherence and is used to evaluate the angle and edge information measures; Q M and Q P are evaluation metrics based on image features. The V IF metric measures the information fidelity of the fused image, and the distortions of the images include additive noise, blur, and global or local changes in contrast. V IF is a fusion measure inspired by human perception. Table 1 shows a summary of these six metrics. A comprehensive and objective evaluation of the fused image quality is achieved by considering these metrics, and the larger the value of all these metrics, the better the quality of the fused image [36].  A fusion measure inspired by human perception Higher

Parameter Analysis
Different parameters determine the performance of the algorithm. In the proposed method, the parameter T in Equation (8) plays a decisive role in the performance of the algorithm, mainly because after processing the detail layers using saliency detection to obtain salient information, a decision map (SGF) was needed to extract this information to the salient detail layer. It was found using Equation (11) that there was more salient information when there was more favorable SGF to the salient information layer. According to Equation (8), the threshold value T was the key factor affecting the initial salient decision maps, influencing the SGF through Equations (9) and (10). For smaller values of T, the SGF was more favorable to the salient detail layers, and more salient information was contained in the salient detail layer. However, when T was too small, the noise in the salient detail layer could not be reduced effectively, so we needed a reasonable size of T. The selection of the parameter T in the proposed model is discussed here. We selected five sets of source images and set the variation range of T to 0.01-0.09 data (eight sets in total) because it was difficult to clearly distinguish the differences in quality of these fusion results using only subjective quality assessment. Six metrics were used to evaluate the fusion results, and the average value of the objective evaluation of the five sets of images was obtained, as shown in Figure 4. Q MI has a large value when T is 0.01, Q M , Q P , and V IF have large values when T is 0.08, Q N ICE has a large value when T is 0.07, and EN has a large value when T is 0.04. Considering these indicators, we set T to 0.07. For the other parameters, the GRF filter was set to [filter size r = 3, blur degree = 0.3, iteration number ′ = 4]. Based on previous suggestions [27], the BLF-LS was set to [ = 12, = 0.02], and the window size for the SML in Equation (13) and in Equation (15) were set to 3 × 3.

Subjective Quality Assessment
For conciseness, we have only shown the results of three sets of images in the subjective evaluation.  For the other parameters, the GRF filter was set to [filter size r = 3, blur degree ε = 0.3, iteration number T = 4]. Based on previous suggestions [27], the BLF-LS was set to [σ S = 12, σ r = 0.02], and the window size for the SML in Equation (13) and LEN in Equation (15) were set to 3 × 3.

Subjective Quality Assessment
For conciseness, we have only shown the results of three sets of images in the subjective evaluation. Figures 5-7 show the fusion results of different types of medical images obtained by different image fusion algorithms.  For the other parameters, the GRF filter was set to [filter size r = 3, blur degree = 0.3, iteration number ′ = 4]. Based on previous suggestions [27], the BLF-LS was set to [ = 12, = 0.02], and the window size for the SML in Equation (13) and in Equation (15) were set to 3 × 3.

Subjective Quality Assessment
For conciseness, we have only shown the results of three sets of images in the subjective evaluation. Figures 5-7 show the fusion results of different types of medical images obtained by different image fusion algorithms.    The fusion results of the different methods on CT-MRI medical images are shown in Figure 5. The local areas are marked by colored rectangles, which are enlarged in the lower left corners for better comparison. All the methods retained the main information and features, as shown in Figure 5; however, there were still significant differences regarding the features. ILLF showed color distortion, which led to the introduction of speckles in the fusion results. Zero-LF, IFCNN, SDNet, EMFusion, and U2Fusion could not completely retain the energy in the CT images, leading to low brightness and contrast in the fusion results. Second, NSCT-PCLLE, NSST-PCNN, LDR, SDNet, EMFusion, and U2Fusion were unable to retain the detail information in the MRI images (yellow part of the magnification area). Figure 5 shows that the proposed method outperformed the other methods in terms of the energy retention of CT-MRI medical-source images. It also successfully preserved information such as the details and structures in the source images without artifacts and brightness distortion. Figure 6 shows a set of PET-MRI images fused by different methods. The fusion results of ILLF, LDR, Zero-LF, IFCNN, and U2Fusion show their insufficient ability in retaining the color in the PET images, which led to color distortion in the fusion results. LDR, SDNet, EMFusin, and U2Fusion performed poorly in retaining the luminance information of the MIR images; the luminance-oversaturation phenomenon occurred in the fusion results under LDR. Under SDNet, EMFusion, and U2Fusion, most of the energy from the MRI images was lost, particularly under SDNet, which led to a low overall illuminance in the images. The enlarged portion in the lower left corners shows that our method was able to retain the detailed part of the MIR image. Additionally, under our method, jagged edges can be observed in the fused image, while these edges were slightly missing under the other methods, demonstrating the superior performance of our method. Overall, Figure 6 indicates that our method could retain the structural information of the source image and outperformed the other methods in expressing intensity-based features.
In Figure 7, under ILLF, LDR, IFCNN, and EMFusion, there are deviations in color in the source SPECT image. Regarding the source image, the ILLF results show grayscale information, the LDR and IFCNN results show lighter colors, and the EMFusion results show color enhancement. NSCT-PCLLE, NSST-PCNN, Zero-LF, SDNet, and U2Fusion did not completely capture the luminance information of the MRI images, and this is represented by a small black shading in the marked red area. The fusion results of NSCT-PCLLE, NSST-PCNN, Zero-LF, SDNet, and U2Fusion show black blocks. As shown by the green enlarged area in the lower left corner of the images, ILLF, NSST-PCLLE, LDR, IFCNN, SDNet, and U2Fusion were not completely capable of retaining the details in the source image, while our method was able to retain them well. Figure 7 shows that the images fused under our proposed method are more informative, clearer, and have a higher contrast than those under the existing methods. Tables 2-4 show the objective evaluations of the different methods. Table 2 shows the objective evaluation results of the CT-MRI images. Our proposed method ranked first for the indicators Q MI , Q NCIE , Q M , Q P and V IF. This shows that our method obtained good results regarding the amount of information it could transfer from the source image to the fused image, the degree of nonlinear correlation, edge information, phase consistency, and information fidelity. Although it did not rank high for EN, the difference between its value and the highest value was small; therefore, we concluded that the proposed method was able to produce good results for the CT-MRI images under an objective evaluation assessment.    Tables 3 and 4 show the objective evaluation results of the nine methods on the PET-MRI and MRI-SPECT color images, respectively. Our method did not rank first in certain metrics, but its overall ranking was at the top. It also achieved good results for the PET-MRI and MRI-SPECT images under the objective evaluation assessment.

Objective Quality Assessment
Based on the above subjective visual evaluation and objective metrics analysis, we concluded that the fusion performance of our method was the highest of all methods. This was mainly due to the good decomposition of the details and basis values of the images from using the BLF-LS, the effective preservation of the significant structure and edge information of the source image in the fused image using saliency detection, and the processing of the weight map using the RGF, which makes full use of the strong correlation between the neighboring pixels.

Discussion on Time Efficiency
In this section, we compared the time efficiency of the proposed method with those of the other nine methods on grayscale images. As shown by the results in Table 5, the Zero-LF, IFCNN, SDNet, EMFusion, and U2Fusion DL methods trained the models in advance, allowing them to process the images quickly. The ILLF method had the longest running time because the ILLF filter was not as fast as the other multi-scale tools and computed the decomposition of the image at different scales. The LRD algorithm took too long in the gradient-domain image enhancement owing to its over-reliance on the fitting function. NSST-PCNN also required more time than our method because of the PCNN iterations that were involved. Although the proposed method was not the fastest, considering its high performance, it was still effective. Moreover, we believe that if we fully optimize the code behind the working of our method and convert it to increase its efficiency using tools such as the graphical processing unit (GPU) and C++, the time required to execute our method will be significantly shorter, enabling the method to satisfy the requirements of more applications. To justify the ability of our proposed method to generalize, we tested the fusion ability of our method on ten sets of IR-VIS images (shown in Figure 8). Six advanced fusion methods for IR-VIS images were selected for comparison: visual saliency map and weighted least square optimization (VSWL) [43], Gaussian curvature filtering (GCF) [44], IFCNN [18], SDNet [40], U2Fusion [42], and SwinFusion [45].

Extension to Infrared (IR) and Visible (VIS) Image Fusion
To justify the ability of our proposed method to generalize, we tested the fusion ability of our method on ten sets of IR-VIS images (shown in Figure 8). Six advanced fusion methods for IR-VIS images were selected for comparison: visual saliency map and weighted least square optimization (VSWL) [43], Gaussian curvature filtering (GCF) [44], IFCNN [18], SDNet [40], U2Fusion [42], and SwinFusion [45].  Figure 9, although all seven methods could retain the energy in the IR image and the details in the VIS image, differences still existed. The red box with the pedestrian in the lower right corner of Figure 9 shows that although all seven methods obtained the detailed and contour information of the person in the source image, the overall brightness, specifically under the SDNet, U2Fusion, and SwinFusion methods, was low. In the fusion results of the proposed methods, the person's edges did not appear as black shadows owing to the smoothing of the edges of the significant target using RGF. Second, regarding the poster board framed in green in the lower-left corner of the image in Figure  9, the VSWL, GCF, IFCNN, SDNet, U2Fusion, and SwinFusion methods compared the light map without retaining the overall luminance information of the light sign. The above analysis proves that our algorithm had the best detail retention and color fidelity and was more consistent with the subjective vision for processing object edges in an image. As shown in Figure 9, although all seven methods could retain the energy in the IR image and the details in the VIS image, differences still existed. The red box with the pedestrian in the lower right corner of Figure 9 shows that although all seven methods obtained the detailed and contour information of the person in the source image, the overall brightness, specifically under the SDNet, U2Fusion, and SwinFusion methods, was low. In the fusion results of the proposed methods, the person's edges did not appear as black shadows owing to the smoothing of the edges of the significant target using RGF. Second, regarding the poster board framed in green in the lower-left corner of the image in Figure 9, the VSWL, GCF, IFCNN, SDNet, U2Fusion, and SwinFusion methods compared the light map without retaining the overall luminance information of the light sign. The above analysis proves that our algorithm had the best detail retention and color fidelity and was more consistent with the subjective vision for processing object edges in an image.  Figure 10 shows the objective evaluation results for the "Queens Road, Bristol" image, and the average value of the objective evaluation of the 10 IR-VIS images (the 10 sets of images shown in Figure 8). The horizontal and vertical coordinates represent the different methods and the values of different evaluation metrics, respectively. The red line shows the objective evaluation of the different methods on the image "Queens Road, Bristol", and the blue line shows the average value of the objective evaluation of the different methods in Figure 8. Regarding the objective evaluation assessment, our method ranked first in , , , , and the average VIF index for the 10 IR-VIS images. Although the result for was not the highest, its difference from the best value was not pronounced. Thus, the validity of the proposed method in terms of objective assessment is confirmed. The above evaluation shows that our method can be effectively extended to IR-VIS image fusion.  Figure 10 shows the objective evaluation results for the "Queens Road, Bristol" image, and the average value of the objective evaluation of the 10 IR-VIS images (the 10 sets of images shown in Figure 8). The horizontal and vertical coordinates represent the different methods and the values of different evaluation metrics, respectively. The red line shows the objective evaluation of the different methods on the image "Queens Road, Bristol", and the blue line shows the average value of the objective evaluation of the different methods in Figure 8. Regarding the objective evaluation assessment, our method ranked first in Q MI , Q NCIE , Q M , EN, and the average VIF index for the 10 IR-VIS images. Although the result for Q P was not the highest, its difference from the best value was not pronounced. Thus, the validity of the proposed method in terms of objective assessment is confirmed. The above evaluation shows that our method can be effectively extended to IR-VIS image fusion.  Figure 10 shows the objective evaluation results for the "Queens Road, Bristol" image, and the average value of the objective evaluation of the 10 IR-VIS images (the 10 sets of images shown in Figure 8). The horizontal and vertical coordinates represent the different methods and the values of different evaluation metrics, respectively. The red line shows the objective evaluation of the different methods on the image "Queens Road, Bristol", and the blue line shows the average value of the objective evaluation of the different methods in Figure 8. Regarding the objective evaluation assessment, our method ranked first in , , , , and the average VIF index for the 10 IR-VIS images. Although the result for was not the highest, its difference from the best value was not pronounced. Thus, the validity of the proposed method in terms of objective assessment is confirmed. The above evaluation shows that our method can be effectively extended to IR-VIS image fusion.

Conclusions
In this study, we proposed a multi-sensor medical-image fusion method based on the BLF-LS and DSC. First, the image decomposition of the base layer using the BLF-LS was simple and effective as it potentially captured large changes in intensity. The detail layer preserved details such as the structure, texture, and edges of the original image efficiently. The DCS model effectively detected the salient information, and the GRF made full use of the strong correlation between the neighboring pixels for weight optimization, allowing the fused detail layer to effectively retain the salient structure and edge information in the source image. Finally, the base-layer fusion rules based on the MLEN effectively preserved the energy information of the source images.
The fusion results of different methods on CT-MRI, PET-MRI and MRI-SPECT images were demonstrated. The experimental results showed the advantages of the proposed method in both subjective visual and objective quantitative evaluations. Compared to the nine state-of-the-art methods used in the study, the proposed medical image fusion algorithm can provide fusion images with clearer edge details, complete salient information, more brightness, and superior colors. Additionally, this method is also applicable to IR-VIS image fusion. However, the proposed fusion method is easily affected by noise because the inputs are in alignment pairs. In the future, we will work on solving the effect of noise on images, thus bridging the gap between medical image fusion and actual clinical applications.