CSID: A Novel Multimodal Image Fusion Algorithm for Enhanced Clinical Diagnosis

Technology-assisted clinical diagnosis has gained tremendous importance in modern day healthcare systems. To this end, multimodal medical image fusion has gained great attention from the research community. There are several fusion algorithms that merge Computed Tomography (CT) and Magnetic Resonance Images (MRI) to extract detailed information, which is used to enhance clinical diagnosis. However, these algorithms exhibit several limitations, such as blurred edges during decomposition, excessive information loss that gives rise to false structural artifacts, and high spatial distortion due to inadequate contrast. To resolve these issues, this paper proposes a novel algorithm, namely Convolutional Sparse Image Decomposition (CSID), that fuses CT and MR images. CSID uses contrast stretching and the spatial gradient method to identify edges in source images and employs cartoon-texture decomposition, which creates an overcomplete dictionary. Moreover, this work proposes a modified convolutional sparse coding method and employs improved decision maps and the fusion rule to obtain the final fused image. Simulation results using six datasets of multimodal images demonstrate that CSID achieves superior performance, in terms of visual quality and enriched information extraction, in comparison with eminent image fusion algorithms.


Introduction
Image processing manipulates input source images to extract the maximum possible information. The information obtained is exploited for several applications, including remote sensing, malware analysis, clinical diagnosis, etc. [1][2][3][4][5]. However, the latter requires greater attention as enhanced clinical diagnosis remains the top priority around the world [6]. In this regard, clinical imaging plays a vital role in modern day health care systems, where Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) are among the most extensively used imaging modalities [7][8][9]. This allows radiologists to analyze the human body and generate different patterns, which are used in clinical analysis [10]. These images provide anatomical statistics [7]; however, the extraction of purposeful functional details from an individual image remains a critical issue. This demands multimodal image fusion, which integrates the complementary information of images from different modalities to produce an enhanced fused image through simulation, thereby providing enriched anatomical and functional information [6,7,[11][12][13].
There are several multimodal image fusion algorithms [7,14,15], which are divided into two major classes, namely the spatial and transform domains [2]. The spatial domain produces fused 1.
We employ contrast stretching and the spatial gradient method to extract edges from the input source images.

2.
We propose the use of the cartoon-texture decomposition that creates an over-complete dictionary. 3.
We propose a modified Convolutional Sparse Coding (CSC) method.

4.
Finally, our proposed algorithm uses enhanced decision maps and a fusion rule to obtain the fused image.

5.
Additionally, this simulation study reveals that the CSID algorithm achieves superior performance, in terms of visual quality and enriched information extraction, in comparison with other image fusion algorithms, as it will be discussed in Section 5.
The rest of the paper is organized as follows. Section 2 critically reviews the eminent related work on multimodal image fusion. Section 3 details the proposed algorithm. Section 4 presents the objective evaluation metrics. Section 5 evaluates the performance of the suggested algorithm in comparison with the state-of-the-art eminent algorithms. Finally, Section 6 concludes the paper with discussion on future research aims.

Related Work
Modern healthcare systems actively use multimodal image fusion for diagnosis [10]. This section critically reviews the eminent work on multimodal clinical image fusion.
Recently, the MST and Sparse Representation (SR) techniques have gained significant popularity in the transform domain and have produced positive results in medical image analysis [26]. However, these methods have shortcomings, such as (i) the "max-l1" rule induces spatial inconsistency in a fused image when different modalities are captured from the source images [27], (ii) the MST-based filters used for the SR-based image fusion [28] are time-dependent due to the training of dictionary and its optimization, and (iii) these algorithms are also unable to decompose several types of images [12]. Another challenge is the complicated oriented shape of source images that cannot be precisely categorized through an already trained dictionary [28]. To address these issues, the authors in [29] propose a training model that employs the well-known K-means algorithm [30]. Research in the domain of multimodal image fusion have produced promising outcomes; however, there are several drawbacks. The authors in [15,25] propose neural network-based fusion algorithms that efficiently adjust and fit the training parameters, but these algorithms are not capable of representing information from multiple sources [31].
Moreover, learning-based algorithms are also found to be useful in multimodal image fusion [32]. To this end, SR in combination with learning-based multimodal medical image fusion strategies are gaining interest of the research community [32]. The works in [23,27,33] employ the SR-based algorithms for image fusion. Similarly, the authors in [34] propose enhanced sparse representation orthogonal matching pursuit (SR-OMP) algorithms. Furthermore, the work in [24] presents another SR-based morphological component analysis model for pixel level image fusion. However, the blurring effect during decomposition restricts the performance of the proposed model. The authors in [35] present a multimodal image fusion algorithm that employs the SR-based cartoon-texture decomposition. However, it also faces the blurring issue during decomposition that results in considerable information loss. Additionally, the pyramid transformation [36] algorithm exhibits limited performance due to inaccuracy in information capturing and path details. Arif et al. [37] present a multimodal image fusion algorithm based on the curvelet transform and genetic algorithm (GA). Here, GA solves the suspicions and evaluates an optimized fused image. Kaur et al. [38] propose another image fusion algorithm that employs deep belief networks. Maqsood et al. [2] present a two scale image decomposition technique, where the spatial gradient-based edge detection method is used to acquire the detail layer and the SR rule is used to construct a fused image. This method produces improved image fusion results, however, it still experiences false structured artifacts in the fused image. Shahdoosti et al. [39] propose a sparse representation in the tetrolet domain for medical image fusion; however, this approach falls a victim of overlapping artifacts in the fused images.
From the literature survey, it is found that the SR-based image fusion algorithms have the advantage of better information extraction in comparison with other fusion algorithms. However, there are several issues that require urgent attention, such as (i) blurring effect near strong edges during image decomposition, (ii) appearance of false structured artifacts in fused image, and (iii) reduced contrast that results in high spatial distortion. To resolve these issues, we propose a novel image fusion algorithm that is detailed in the following section.

The Proposed Convolutional Sparse Image Decomposition (CSID) Algorithm
This section presents our proposed novel algorithm for multimodal image fusion, namely, Convolutional Sparse Image Decomposition (CSID). CSID comprises six phases that include contrast enhancement, edge detection, cartoon and texture decomposition, enhanced CSC-based sparse coding, sparse coefficient maps fusion, and fused image reconstruction, as depicted in Figure 1. These phases are detailed in the following subsections.

Contrast Enhancement
Contrast enhancement is a major preprocessing step in image fusion for diagnostics processes [40]. No-Reference Image Quality Assessment (NR-IQA) and Non-Parametric Histogram Equalization (NPHE) are commonly used for contrast enhancement [41,42]. NR-IQA employs histogram equalization and uses structural-similarity index metric to generate images with enhanced contrast, whereas NPHE employs modified spatial transformation-based adaptive contrast enhancement. However, these techniques require manual parameter tuning that limits their performance in accurately reflecting the image contrast with respect to an input image. To resolve the aforementioned issues, CSID employs the Bio Inspired Multi Exposure Fusion (BIMEF) framework [40] that improves contrast and preserves mean brightness of the source images. BIMEF uses illumination estimation for the construction of a weighted matrix. Thus, we start with the detection of optimal exposures using a camera response model, thereby, producing synthetic images that are better exposed in the regions in comparison with source images. Furthermore, we apply the weight matrix, with an appropriate exposure, upon the synthetic image, which is then merged with the source image for contrast enhancement. Here, to conserve the contrast of an image, a weighted matrix is associated with the scene brightness that is computed as follows [40] where A denotes the scene brightness map and φ is a parameter managing the degree of enhancement. Moreover, since the highest regional maxima is better identified using a max function in comparison with a min function [43,44], CSID computes the dark regions (R), based upon initial estimation of brightness for each pixel (x), as [40], R(p, q) = maxÍ.
Since absolute brightness has local consistency for the boundaries with same structures, A eliminates the textural edges and builds a significant structure of an image. CSID optimizes A as [40], where || || 1 and || || 2 represent the l 1 and l 2 norms, respectively, is the first order derivative filter consisting h A and v A as horizontal and vertical components, respectively. ϕ denotes the coefficient and W refers to the weighted matrix. The weighted matrix is further refined to obtain significant edges in an image as [40], where | | denotes the positive value operator, d(p, q) is the neighborhood window pointed at pixels p and q, and κ represents the constant to evade zero denominator. Moreover, we use (5) to evaluate A i that minimizes complexity. A i is then applied upon the source image to generate the final outcome of this phase, i.e., an image with enhanced contrast.
The results of contract enhancement are illustrated in Figure 2. After contrast enhancement, CSID enters the second phase, which is detailed in the following subsection.

Edge Detection
Edge detection finds the boundaries of objects in an image through identification of brightness discontinuities [45]. To this end, CSID performs edge detection in the image, obtained after contrast enhancement, by employing the Sobel operator [45]. Edge detection yields better performance when applied upon the images with enhanced contrast in comparison with original source images, as it is demonstrated in Figure 3. Figure 3. Edge detection results, where (a,b) are the original source images, (c,d) represent the gradients of (a,b) using the Sobel method, (e,f) shows the images with enhanced contrast using BIMEF, and (g,h) represent the gradients of (e,f) using the Sobel method.
For edge detection, CSID includes the image gradient approximation, where each location is either the corresponding gradient vector or the norm of this vector. The image is convoluted with the first kernel from left to right and the gradient for the X coordinate is obtained as, Similarly, the gradient for the Y coordinate is obtained by convoluting the first kernel from top to bottom as, Furthermore, the image gradient vectors, obtained using (6) and (7), are used to find edges as, Figure 3 depicts a comparison of edge information in the source and enhanced images obtained after completion of the first two phases of our proposed CSID algorithm. Figure 3a,b present the source CT and MRI images, respectively, while their edge maps are demonstrated in Figure 3c,d, respectively. Furthermore, Figure 3e,f include the improved CT and MRI images obtained after contrast enhancement (as detailed in Section 3.1) and their respective gradient maps are shown in Figure 3g,h. Here, improved edge detection is observed in the image with an enhanced contrast in comparison with edge detection in the original source image. On completion of this phase, CSID proceeds to the third phase, which is discussed in the following subsection.

Cartoon and Texture Decomposition
Cartoon-texture decomposition divides an image into the geometric cartoon and texture components, which removes the background interference. To this end, we propose a modification to the legacy Convolutional Sparse Coding (CSC) [46]. In our proposed modified CSC model, a similarity threshold is maintained to compute residual correlation similarity between the selected sparsest coefficients and the other coefficients in the sparse coding phase, which is discussed in Section 3.4. This expands the coefficients set, thereby, obtaining more suitable coefficients for SR. Furthermore, it also accelerates the process of sparse coding, and the residual correlation similarity easily minimizes the target error for each image patch signal. Moreover, multiple similar patches are used to represent a single image patch that further enhances the fused image by avoiding the blurring effect. An optimized solution of the CSC problem using (9) is generated as, where q c = {h c,u } U c u=1 and q t = {h t,u } U t u=1 represent the sets of dictionary filters for SR of the cartoon and texture components. ϕ c,u and ϕ t,u are the sparse coefficients that estimate q c and q t when convolved with filters {h c,u } and {h t,u }, respectively, and ν c andν t are the positive regularization parameters. The optimization problem is solved iteratively over ϕ c,u and ϕ t,u . As, ϕ t,u , h t,u , and h c,u are fixed, the accompanying issue is settled for the updated ϕ c,u as, Similarly, for the updated ϕ t,u , keeping ϕ c,u fixed, the accompanying issue is settled as, Alternating Direction Method of Multipliers (ADMM)-based CSC [47] is used to address the aforementioned two issues in (10) and (11). This completes the cartoon and texture decomposition phase and allows CSID to proceed to the next phase, which is detailed in the following subsection.

Enhanced CSC-Based Sparse Coding
CSID employs our modified CSC model for cartoon and texture layer decomposition using {ϕ c,u } U c u=1 and {ϕ t,u } U t u=1 , which represent the sparse coefficient vectors of the cartoon and texture components, respectively. Moreover, the same coefficient vectors are used to evaluate sparse coefficient maps in the next phase of our proposed CSID algorithm, as detailed in the following subsection.

Sparse Coefficient Maps Fusion
CSID applies the l1-norm of the sparse coefficient maps as the activity level measurement of the enhanced images, which remains a common approach adopted in several SR-based image fusion techniques [2,48,49]. Sparse coefficient maps fusion uses an attribute j (j ∈ {c, t}) that refers to the cartoon and the texture components and ϕ n j,1:Uj (p, q) that uses the U j dimensional vector consisting coefficients of ϕ n j,u at points (p, q). Hence, the initial activity level map ζ n j (p, q) map is obtained as, A window-based averaging scheme is then applied for noise removal and enhancement of robustness to misregistration. Thus, the activity level map ζ n j (p, q) is computed as, where m c and m t refer to the window size for cartoon and texture components, respectively. Finally, CSID employs the "choose-max" rule to obtain the fused coefficient maps {ϕ with j ∈ {c, t} as, This completes the sparse coding phase that leads CSID to the final phase, which is detailed in the following subsection.

Fused Image Reconstruction
This phase is responsible to fuse enhanced CT and MRI images obtained from the aforementioned This phase completes the multimodal fusion process through our proposed CSID algorithm. The cartoon component includes edges, round and anisotropic structure parts, whereas the texture component contains detailed texture information, periodic behaviors and several levels of noise data. This enables the proposed CSID algorithm to surpass the limitations of the existing fusion techniques (as detailed in Section 2) by allowing the reconstruction of lost information in the CT and MRI images. The reconstruction process results in sharper images with enriched information. The fusion of such enhanced images improves accuracy during clinical diagnosis. The next section discusses the objective evaluation metrics used for the performance evaluation of our proposed CSID algorithm.

Objective Evaluation Metrics
The objective performance evaluation metrics include mutual information, entropy, feature mutual information, spatial structural similarity, and visual information fidelity, which are used by state-of-the-art works [27,[50][51][52][53]. These metrics are defined in the following subsections.

Mutual Information (MI)
MI [50] computes the common information among two discrete variables as follows: where H ij (l, m) denotes the combined probability density distribution of the grayscale image in i and j. H i (l) and H j (m) refer to the probability density distribution of the grayscale image in i and j, respectively. MI expresses the sum of mutual information between each source image and the fused image. A larger value of MI refers to increased information extracted from the input source images.

Entropy (EN)
EN [27] refers to the measure of information randomness in a sample, which is expressed as, where N is the number of gray levels, which is taken as 256 in this work, and H i (l) is the normalized histogram of the fused image i.

Feature Mutual Information (FMI)
FMI [51] computes the feature mutual information. A non-reference performance metric for fusion methods is determined as, where N represents the number of sliding windows, S l (m) is the entropy of the nth window in an image m, I l (m, i) refers to the regional common information between the nth window of images m and i. Similarly, I l (m, j) is the regional MI between the nth window of images m and j.   [52] is an edge-based fusion quality evaluation metric, which determines the quantity of transmitted edge information into the fused image from input images. Q AB/F for a set of source images is computed as, where Q AB/F (i, j) denotes the information transferred from a source image into the fused image for the pixel location (i, j) and W B (i, j) is the weight for a pixel location (i, j). Here, pixels with a higher gradient value influence Q AB/F more in comparison with pixels having a lower gradient value. Thus, W A (i, j) = [Grad(x, y)] T : T remains constant.

Visual Information Fidelity (VIF)
VIF [53], being a perceptual distortion metric, stands as an important index for image quality assessment. In the context of image fusion, VIF evaluates the performance by calculating common data between a source image and its corresponding fused image. Since VIF provides accurate distortions identification, this work takes average VIF value for the performance evaluation of the given set of algorithms, as shall be discussed in Section 5.

Simulation Setup
Simulation results are derived using MATLAB R2020b (MathWorks Inc., MA, USA), which is used by state-of-the-art methods for multimodal image fusion due to its extensive built-in libraries support [2,[23][24][25]. The hardware platform includes Intel Core i7 − 9750H 2.59 Giga Hertz processor with 16 GB memory running Microsoft Windows 10 (Microsoft, WA, USA). The multimodal brain image datasets (Data-1 through Data-6) are obtained from [56], which are composed of the CT and MR images. For performance evaluation, selected 500 grayscale images are taken from each of the aforementioned datasets. Input images dimensions are standardized as 256 × 256 pixels. Both qualitative and quantitative analysis are performed for the performance evaluation that are detailed in the following subsections.

Results and Discussion
Six different datasets of multimodal images, referred as Data-1 through Data-6, are used in the simulations. Figure 4 depicts sample images from the aforementioned datasets. The fusion results, generated by our proposed CSID algorithm and the aforementioned eminent fusion algorithms, are shown in Figures 5-10. Each result presented is averaged over 20 replicated simulation runs by keeping all the parameters fixed and changing the random seed values. The following subsections demonstrate and discuss the obtained results.

Qualitative Analysis of the Given Set of Algorithms for Multimodal Fusion
This subsection presents the results based on visual observations of the images generated through our proposed CSID algorithm in comparison with the aforementioned algorithms using different datasets, i.e., Data-1 through Data-6. Visual quality comparison of the Data-1 dataset using different fusion methods, i.e., DWT, DTCWT, LP, GFF, NSCT, NSST-PAPCNN, CSR, CSMCA, CNN, and the proposed algorithm are shown in Figure 5a through Figure 5j, respectively. A CT image gives information about hard tissues and their structures, whereas an MRI image indicates information regarding soft tissues. For better diagnosis, it is essential to merge critical information of the aforementioned images into one fused image [12]. In this regard, the aforementioned set of algorithms perform multimodal image fusion.
The qualitative results shown in Figure 5 depict inferior performance, in terms of contrast and visual effect, for DWT (Figure 5a), DTCWT (Figure 5b), NSCT (Figure 5e), and CSR (Figure 5g). Note that these algorithms are not capable of preserving information in the fused image, which relates to the objective evaluation metric MI that remains proportional to the level of information extraction. Additionally, Section 5.2.2 further validates this claim through quantitative analysis, where DWT, DTCWT, NSCT, and CSR exhibit lower MI score in comparison with other algorithms. Moreover, GFF (Figure 5d) and NSST-PAPCNN (Figure 5f), yield better results, when compared with DWT, DTCWT, NSCT, and CSR algorithms, by avoiding information loss. However, the lack of noise removal results in over enhancement of the structural features in these algorithms. CSMCA (Figure 5h) and CNN (Figure 5i) further improve the visual quality, where enhanced visualization remains an outcome of lesser information loss. Finally, our proposed CSID algorithm (Figure 5j) yields clear, high contrast and superior visual quality and preserves the salient features, which include considerably enhanced bone structure and soft tissues information in comparison with other given algorithms.   CNN (Figure 6i) also does not remain effective in transferring information from the source images. The following section (Section 5.2.2) provides quantitative analysis with respect to the given objective evaluation metrics (as discussed in Section 4) that affirms the aforementioned statements. Moreover, in addition to MI, FMI x,y m and Q AB/F evaluation metrics also remain critical that relates to accuracy in the resultant fused images. Although, GFF, NSST-PAPCNN, and CSMCA provide better results in comparison with DWT, NSCT and CSR by conveying complementary information into the fused image, but these algorithms lack accuracy (as shall be discussed in Section 5.2.2 through quantitative analysis). In the end, note that our proposed algorithm (Figure 6j) provides better visual effects in comparison with the other aforementioned algorithms, due to its improved information extraction and edge detection abilities.  Figure 7i) remain slightly better, however, these algorithms still remain ineffective to reduce the information loss considerably. Furthermore, the results of our proposed CSID algorithm (Figure 7j) are found the best among all the aforementioned algorithms, as it efficiently preserves edges and texture details.  (Figure 8h), show smaller Q AB/F scores that impact sharpness of the resultant fused images. Moreover, these algorithms also experience distorted regions due to lower VIF scores in comparison with other algorithms, as shall be discussed in the quantitative analysis performed in Section 5.2.2. Additionally, LP ( Figure 8c) and CSR (Figure 8g) are also found incapable of retaining originality due to increased information loss. NSCT (Figure 8e) and CNN (Figure 8i) provide comparatively improved results, as these algorithms provide effective information integration. Here again, CSID accomplishes the best performance in comparison with the aforementioned algorithms, as it remains capable of transferring more details and provides better contrast. Furthermore, the aforementioned set of algorithms is evaluated using the Data-5 dataset and the corresponding qualitative results are shown in Figure 9.
Here  Finally, CSID and the aforementioned set of algorithms are evaluated using the Data-6 dataset, where Figure 10 demonstrates the corresponding qualitative results. All the algorithms, other than CSID, are unable of extracting detailed information that results in blurred fused images. To this end, our proposed CSID algorithm shows improved edge detection and provides enhanced contrast, in comparison with all the aforementioned algorithms that yield better visualization.  Tables 1 and 2 show the results obtained from DWT, DTCWT, LP, GFF, NSCT, NSST-PAPCNN, CSR, CSMCA, CNN, and CSID against the objective metrics, such as MI, EN, FMI, Q AB/F , and VIF (as detailed in Section 4). Scores obtained for these metrics remain proportional to the quality of the resultant fused image. Therefore, a smaller score indicates missing information and false structured artifacts during fusion process, whereas a higher score results in enhanced fused images. To this end, all the highest scores obtained are highlighted in bold in Tables 1 and 2. These results demonstrate that our proposed CSID achieves higher MI, EN, FMI, Q AB/F , and VIF scores in comparison with all the other image fusion algorithms using different datasets, i.e., Data-1 through Data-6. This indicates improved performance for CSID, as it remain capable of extracting enriched information from the input source images, thereby, preserving enhanced edges details and yields enhanced visual quality. In the past few decades, non-invasive applications (like multimodal fusion) have gained tremendous popularity among the healthcare professionals that adds ease and accuracy to the diagnostic process [57,58]. CSID aims to enhance clinical diagnostics by improving the multimodal fusion. We acquired the expert opinion of two healthcare professionals (one radiologist and one physician, whose help we kindly acknowledge) based upon the visual observation of the resultant fused images generated through the given set of algorithms. These experts appreciated the enhanced results generated by CSID in comparison with other state-of-the-art algorithms. Furthermore, it was added by the experts that CSID enables detailed information extraction along with clearer edge detection to yield enhanced fused images, which remain promising for better clinical diagnosis.

Statistical Analysis of the Results
We used the non-parametric Friedman's test and the post-hoc Nemenyi test to analyze how the analyzed methods differ from each other. The Nemenyi test calculates a Critical Difference (CD) using the Tukey's distribution, and any difference in the ranks between method ranks that is greater than the CD is considered as significantly different [59]. In Figure 11, we used the values from Tables 1 and 2 to calculate average data fusion method ranks. The results of the Nemenyi test show that the proposed CSID method achieved better values than all other methods when evaluated in terms of MI, EN, FMI x,y m , Q AB/F , and V IF scores for six datasets (from Data-1 to Data-6), however the advantage over the next best method CNN [25] was not statistically significant (difference between mean ranks < 2.4734, Friedman's p < 0.001). Figure 11. Results of Nemenyi test on different scores evaluating data fusion methods while using six datasets (Data-1 through Data-6). CSID is the proposed method.

Computational Efficiency
This subsection evaluates the computational efficiency of our proposed CSID algorithm in comparison with DWT, DTCWT, LP, GFF, NSCT, NSST-PAPCNN, CSR, CSMCA, and CNN. Tables 1  and 2 show the execution time (in seconds) for each of the aforementioned algorithms when applied on the given datasets, i.e., Data-1 through Data-6. The results shown in Tables 1 and 2 demonstrate that LP exhibits the smallest execution time among the aforementioned algorithms, whereas CSMCA bears the highest execution time. Considering our proposed CSID algorithm, it has smaller execution time in comparison with DTCWT, NSST-PAPCNN, CSR, and CSMCA, and higher execution time than DWT, LP, GFF and NSCT. This is because CSID employs the cartoon-texture component gradient-based feature extraction that enhances image visualization, as shown in Section 5.2.1. Since the main aim of this work is to enhance visualization, a tradeoff in terms of slight increase in execution time remains affordable. Moreover, execution time minimization will be taken as future extension of this work.

Conclusions
Multimodal medical image fusion has gained a firm stature in modern day healthcare systems. There are several fusion algorithms, which merge multiple input source images to extract detailed information that is exploited to enhance the clinical diagnosis. However, these algorithms have several limitations, such as blurring edges during decomposition, excessive information loss that results in false structured artifacts, and high spatial distortion due to inadequate contrast. This work aims to resolve the aforementioned issues and proposes a novel CSID algorithm that performs contrast stretching and identifies edges by using spatial gradient. CSID proposes the use of cartoon-texture decomposition that creates an overcomplete dictionary. Moreover, this work proposes a modification to the legacy convolutional sparse coding method, and employs enhanced decision maps and fusion rule to obtain the final fused image. Simulation results demonstrate that CSID attains improved performance, in terms of visual quality and enriched information extraction, as compared with other known fusion algorithms. Future work will aim on reducing the execution time of CSID to enable rapid image fusion. Furthermore, the extension of CSID to provide applications, such as visible-infrared image, multi-exposure image, and multi-focus image fusions, can also be taken as a future research direction.