Next Article in Journal
Research on Analog-to-Digital Converter (ADC) Dynamic Parameter Method Based on the Sinusoidal Test Signal
Previous Article in Journal
A DDQN Path Planning Algorithm Based on Experience Classification and Multi Steps for Mobile Robots
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Artificial Intelligence-Based Multimodal Medical Image Fusion Using Hybrid S2 Optimal CNN

by
Marwah Mohammad Almasri
1,* and
Abrar Mohammed Alajlan
2
1
Department of Computer Science, College of Computing and Informatics, Saudi Electronic University, Riyadh 11673, Saudi Arabia
2
Self Development Skills Department, Common First Year Deanship, King Saud University, Riyadh 11451, Saudi Arabia
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(14), 2124; https://doi.org/10.3390/electronics11142124
Submission received: 18 February 2022 / Revised: 7 May 2022 / Accepted: 16 May 2022 / Published: 6 July 2022
(This article belongs to the Topic Medical Image Analysis)

Abstract

:
In medical applications, medical image fusion methods are capable of fusing the medical images from various morphologies to obtain a reliable medical diagnosis. A single modality image cannot provide sufficient information for an exact diagnosis. Hence, an efficient multimodal medical image fusion-based artificial intelligence model is proposed in this paper. Initially, the multimodal medical images are obtained for an effective fusion process by using a modified discrete wavelet transform (MDWT) thereby attaining an image with high visual clarity. Then, the fused images are classified as malignant or benign using the proposed convolutional neural network-based hybrid optimization dynamic algorithm (CNN-HOD). To enhance the weight function and classification accuracy of the CNN, a hybrid optimization dynamic algorithm (HOD) is proposed. The HOD is the integration of the sailfish optimizer algorithm and seagull optimization algorithm. Here, the seagull optimizer algorithm replaces the migration operation toobtain the optimal location. The experimental analysis is carried out and acquired with standard deviation (58%), average gradient (88%), and fusion factor (73%) compared with the other approaches. The experimental results demonstrate that the proposed approach performs better than other approaches and offers high-quality fused images for an accurate diagnosis.

1. Introduction

Image fusion represents the procedure of combining the image diversity obtained by the different modalities. It is widely utilized in the diagnosis of the disease, surgery, and treatment planning. The various classes such as bones, organs, or tissues are reflected in several medical images with several modalities. The fusion process in medical application is utilized for image correctness and detection as well as assessment of medical issues by preserving and enlightening the relevant attributes [1]. In the medical field, acomputed tomography (CT) image observes thick frameworks such as bones when compared to magnetic resonance imaging (MRI) in the break examination. MR imaging provides information linked to the soft tissues for reflecting the absorbed information, progression, such as single-photon emission CT (SPECT), and positron emission tomography (PET). PET offers highly sensitive images and SPECT represents the nuclear imaging method used for exploring the flow of blood in organs as well as tissues [2]. The major application of this fusion is for extracting the medical information from various sensors that is normally not visible in the image form. Some biomedical sensors like ultrasound, PET, MRI, X-ray, and CT provide clinical information usingthe reflection of the human body organs [3]. To obtain proper information regarding perfect detection, clinicians are usually required to merge various kinds of medical images from an identical location to detect the causes of a patient’s issues. Image fusion methods offer an efficient scheme for resolving these problems. Medical image fusion methods fuse the multi-modality medical images for accurate as well as reliable medical detection [4].
Image fusion is classified into three levels, namely, feature-level, decision level and pixel-level. Feature level fusion realizes the feature specifications and their dissimilarities such as color, shape, texture, edge, etc. and integrates the dissimilarities that depend upon the feature resemblance [5]. In feature level fusion, the features are removed distinctly from every source image. Decision level fusion is utilized to combine the higher-level outcomes from the various algorithms to obtain the final decision for the fusion procedures. Every image is first fused independently and then provided to the fusion process. Decision level fusion divides the pixels from the various source images, which depends upon the extracted features, and gets the decision for the suitable class label for every pixel [6]. Pixel level fusion is utilized to conserve the spatial features of the source image pixels. Hence, numerous pixel level fusion techniques have been offered recently. Pixel-level fusion is classified into two types of methods depending on their modes, such as the transform domain-based and the spatial domain-based image fusion methods. However, the image fusion is developed through both the transform domain-based as well as the spatial domain-based methods [7].
Spatial domain-based fusion methods utilize local features such as the standard deviation, spatial frequency, and gradient of the source images. The normally utilized techniques in the spatial domain methods contain intensity hue saturation (IHS) and principal component analysis (PCA). The fused images achieved via these techniques generally suffer from high spatial distortion and low SNR. In the transform domain schemes, the source images are decomposed into expressive sub-bands to distinguish the salient attributes such as edges and sharpness of the image. The standard transform domain fusion methods depend upon multi-resolution analysis (MRA) [8,9,10]. In medical applications, medical image fusion methods are capable of fusing the medical images from various morphologies to obtain reliable medical diagnosis. The single modality image cannot provide sufficient information for exact diagnosis. Hence, this paper proposes multimodality medical image fusion based on a CNN with a hybrid optimization dynamic (HOD) algorithm in the discrete wavelet transform. Initially, the multimodal medical images are transmitted into the MDWT and optimization models are utilized to obtain the fused images. The fused images are then classified into malignant or benign using a CNN-HOD classifier. The main purpose of the HOD algorithm is to improve classification accuracy. The experimental results reveal that the proposed method performs better than the existing multimodality medical image fusion methods. The major contribution of the paper is as follows.
A modified discrete wavelet transform (MDWT) is utilized to decompose the images into low- and high-frequency sub-bands.
The fused images are classified as malignant or benign using the proposed convolutional neural network-based hybrid optimization dynamic algorithm (CNN-HOD).
The proposed approach is compared with various other image fusion-based techniques to evaluate the performance of the system.

2. Review of Biomedical Imaging Process

Yadav et al. [11] proposed the hybrid discrete wavelet transform and principal component analysis (PCA) techniques (DWT-PCA) for the process of medical image fusion using image modalities such as MRI, PET, SPECT, and CT. Poor image quality and inconsistent performances with minimum efficiency was considered as the significant drawback of this approach. Subbiah et al. [12] proposed the enhanced monarch butterfly optimization algorithm and discrete shearlet transform with restricted Boltzmann machine (EMBO-DST with RBM) approach for multimodal medical image fusion. The medical image fusion was achieved using four sets of benchmark database images (represented as D1, D2, D3, and D4), consisting of the MRI, PET, SPECT, and CT images. This technique faced a few difficulties during the implementation process and failed to perform under real-time applications.
Wang et al. [13] proposed a convolutional neural network for fusing the pixel activity information of input source images to understand the creation of weight maps. Eight various image fusion methods were utilized, fusing images such as MRI, CT, T1, T2, PET, and SPECT. The major drawback of this method was that it was difficult to fuse infrared-visible and multi-focus image fusion. Parvathy et al. [14] proposed the discrete gravitational search algorithm (DGSA) with a deep neural network to improve the classification accuracy. The proposed method utilizes four datasets (I, II, III, and IV) that include the modalities such as CT, SPECT, and MRI. The performance of the proposed method was evaluated using measures such as sensitivity, accuracy, precision, specificity, fusion factor, and spatial frequency. The major difficulty of this method was implementing it in real-time applications.
Tan et al. [15] proposed a pulse coupled neural network in a non-subsampled shearlet transform approach to improve the fusion quality of medical images. Above 100 pairs of multimodal medical images were obtained from the Whole Brain Atlas dataset in which the modalities consist of MRI, PET, and SPECT. Li et al. The authors of [16] proposed a Laplacian re-decomposition method (LRD) to enhance the multimodal medical image fusion quality. This method utilized 20 pairs of multimodal medical images collected from Harvard University medical library. Arif et al. [17] proposed a novel method for fusion of multimodal medical images that depends on curvelet transform as well as a genetic algorithm (GA). The dataset was achieved at CMH Hospital Rawalpindi from modalities such as CT, MRI, PET, MRA, and SPECT. The major drawback of this method was determining the decomposition level. Kaur et al. [18] decompose an image using non-subsampled contourlet transform with multi-objective exception as well as differential evolution for the multimodality medical image fusion. The major drawback of this method was fusing the remote sensing images.
Hu et al. [19] proposed anew fusion method that integrates separable dictionary optimization with a Gabor filter in the non-subsampled contourlet transform (NSCT) domain. The proposed method was tested on 127 groups of brain anatomical images from the Whole Brain Atlas medical image database with modalities such as MRI and CT images. The major drawback of this method was the greater time consumption. Xia et al. [20] proposed a parameter-adaptive pulse-coupled neural network (PAPCNN) method to obtain a better fusion effect. The proposed method utilized 70 pairs of source images collected from the Whole Brain Atlas of Harvard Medical School [13] and the Cancer Imaging Archive (TCIA). The medical images were fused using modalities such as CT, MRI, T1, T2, PET, and SPECT. Table 1 depicts a summary of related works on multimodality medical image fusion.
Shehanaz et al. [21] suggested optimum weighted average fusion (OWAF) with a particle swarm algorithm (PSO) to enhance the performance of multimodal mapping. The multi-modality imaging pairs, namely, MR-CT, MR-SPECT, and MR-PET, were used for the evaluation of the OWAF method. The simulation setup was carried out using a public image dataset that contains normal and diseased brain images (http://www.med.harvard.edu/AANLIB/, accessed on 18 January 2022). The result showed that the OWAF-PSO method achieved greater fusion qualities, but it took more computational time to perform the task.
To enhance the quality of fusion images, Dinh et al. [22] introduced a sum of local energy function with a Prewitt compass operator (SLE-PCO) along with an equilibrium optimizer algorithm (EOA). In this, SLE-PCO increases the contrast of the image and EOA prevents the loss of significant data. The performance of this approach was validated using MRI-PET medical images taken from the website http://www.med.harvard.edu/AANLIB/ (accessed on 18 January 2022). This approach efficiently improves low contrast medical images and conserves detailed layers of data, but the drawback was high computational complexity.
Dinh et al. [23] demonstrated a three-scale decomposition (TSD) technique, a rule base on local energy function using a Kirsch compass operator (FR-KCO) and a marine predators algorithm (MPA) to enhance image details, preserve significant data, and increase image quality, respectively. The MRI-PET medical images were utilized from http://www.med.harvard.edu/AANLIB/ (accessed on 18 January 2022) to determine the performance of the approach. This method achieved higher fusion performance while the limitation was low information entropy.

3. Proposed Methodology

The block schematic of the proposed multimodality medical image fusion is depicted in Figure 1. The information source is fused from a single source of various interval times. In the first stage, the image is fused by deliberating the modified discrete wavelet transform. The input images are CT, MRI, PET, and SPECT images. These images are obtained from online or near the scan centers. To achieve the maximum image fusion level, the coefficient of the transform uses the modified wavelet transform. In the next phase, the fused coefficients of MDWT are given as an input to the convolutional neural network (CNN) classifier. The accuracy of the classifier is enhanced by using the hybrid optimization dynamic (HOD) algorithm. The fused image is classified into malignant or benign by the CNN with the HOD. The HOD is utilized to enhance the classification accuracy and also utilized in optimizing the weights of the CNN.

3.1. Shearlet Transform

The shearlet transform (ST) occurred as the dominant system through multiscale geometric analysis (MGA) provided with the elegant mathematical form. It is locally multi-directional, well-localized, shift-invariant, multiscale, and ideally sparse. The combined dilation for the affine system is expressed as
ν j , l y = D E T A M j 2 ν s K J j y n : j , k , N 2
The anisotropic matrix A M controls the shearlet scale and the shear matrix s controls the direction. The shift parameters, direction, and scale are denoted by l , k , and j . The invertible matrices are s and A M ; it is expressed as
A M = e 0 0 e j 2             and             s = 1 S 0 1
The below equation expresses the shearlet function and it is computed as
ν ^ 0 γ = ν 0 γ 1 , γ 2 = ν ^ 1 γ 1 ν ^ 2 γ 2 γ 1
The Fourier transform of ν is represented by ν ^ .
j 0 ν ^ 1 2 2 j η 2 = 1 , η 18
For each j 0 , ν 2 is
l = 2 j 2 j 1 ν ^ 2 2 j η l 2 = 1 , η 1
The above Equations (4) and (5) concluded as
j 0 l = 2 j 2 j 1 ν ^ 0 η A M 0 j s 0 l 2 = j 0 l = 2 j 2 j 1 ν ^ 2 2 k γ 2 γ 1
From the above equations, the discrete NSST is acquired.

3.2. Modified Discrete Wavelet Transform

The input images are decomposed by the modified discrete wavelet transform (DWT). The 1D examination is modified by the multi-resolution examination, which is based on two-dimensional wavelet transform. If ϕ ( a ) and η ( a ) represent the one-dimension scale function and the wavelet function, correspondingly, the subsequent one 2-D scale function as well as three 2-D wavelet functions consist of the foundation of 2-D wavelet transform.
ϕ ( a , b ) = ϕ ( a ) ϕ ( b )
η U ( a ) = ϕ ( a ) η ( b ) η G ( a ) = η ( a ) ϕ ( b ) η E ( a ) = η ( a ) ϕ ( b )
The L-level decomposition of the image follows F ( a , b ) , and the approximation as well as the three detail transform coefficients are computed.
X M F ( a , b ) = F ( a , b ) , ϕ M ( a , b )
E M U F ( a , b ) = F ( a , b ) , η M U ( a , b )
E M G F ( a , b ) = F ( a , b ) , η M G ( a , b )
E M E F ( a , b ) = F ( a , b ) , η M E ( a , b )
The procedure for using single level wavelet decomposition is as follows:
  • Obtain the original source image as well as the secret image and then obtain the red (R) plane distinctly and establish the single level 2-DDaubechies DWT decomposition on the input source image and the secret image.
  • Let us assume the embedding coefficient is represented as x , then the embedding coefficient value is extended from 0 to 1, the coefficient of x , and there is a huge rise in robustness and a small rise in transparency.
  • The approximation coefficient is established by utilizing the expression of the horizontal coefficient, diagonal coefficient, and vertical coefficient. The approximation coefficient of the inserted image = ( 1 x ) approximation coefficient of the input image + x approximation coefficient of the secret image. In addition, asimilar expression is utilized to compute the diagonal coefficient, horizontal coefficient, and vertical coefficient of the inserted image.
  • Establish the single level 2-D Daubechies inverse DWT decomposition on the computed horizontal, diagonal, approximation, and vertical coefficients to obtain the horizontal, diagonal, approximation, and vertical coefficients of the R plane of the integrated image.
  • The above declared scheme is completed for the blue (B) plane and green (G) plane disjointedly and integrates the blue (B), green (G) and red (R) plane to achieve the integrated image.

3.3. CNN-HOD-Based Image Fusion Process

Image fusion represents the procedure of combining the diverse images obtained via the different modalities. During the image fusion process, the transform coefficients obtained from MDWT are given to the CNN-HOD technique to classify the fused images into malignant or benign. The HOD is utilized to enhance the classification accuracy and also to optimize the weights of the CNN. A basic description based on the CNN and HOD optimization algorithm is given in the following section.

3.3.1. Convolutional Neural Network (CNN)

The convolutional neural network (CNN) has had enormous growth in various fields of application for solving problems concerning the classification of images [24]. CNN architecture contains a convolutional layer, pooling layer, and SoftMax layer.

Convolutional Layer

The proposed convolutional neural network is comprised of three convolutional layers. The first convolutional layer is utilized to remove numerous low-level features from the input image, namely, edges, corners, and lines. The other two layers of the convolutional network achieve high-level attributes. The attributes of every output map integrates numerous input maps via the convolutions. Normally, the output is specified using the subsequent formula:
a k m = f j M k a j m 1 l j k m v k m
where m indicates the mth layer, l j k indicates the convolutional kernel, v k indicates the bias, and M k indicates the input maps sets. The detailed CNN’s implementation utilizes the sigmoid function, and additive bias is also employed in it. For instance, the unit value at the location (a, b) in the map of kth feature and in the jth layer is indicated as
y j k a b = s i g m o i d v j k + p = 0 P j 1 q = 0 Q k 1 z j k p q y ( i 1 ) ( a + p ) ( b + q )
From the above equation, sigmoid (.) represents the sigmoid function, the feature map bias is indicated as v j k , P j and Q k represent the kernel height and width, and z j k p q represents the value of the kernel weight at the location (p, q) associated to the (j, k) layer. The CNN parameters such as v j k and z j k p q represent the kernel weight.

Pooling Layer

A pooling layer for the sub-sampling layer in a CNN is employed to decrease the variance; it is used to evaluate the highest value over the image of the definite attribute. The pooling layer plays a significant role in the peripheral blood cells classification and recognition. First, the probability p is evaluated for every region k with respect to the below Equation (15).
p j = α j l R k α l
The pooling region is represented as R k in the region k of the feature map and the index of every element is represented as j inside the region. The advantages of this type of implementation are the pooling layer cannot produce the convergence speed of the CNN and also increases the capability of generalization.

SoftMax Layer

A SoftMax layer is employed in the multi-class classification problem. The function of the hypothesis obtains the form:
g ϕ ( a ) = 1 1 + e ϕ T a
The major objective of this layer is to train ϕ to decreasethe cost function L ( ϕ ) .
L ( ϕ ) = 1 n j = 1 n k = 1 m m b ( j ) = k log p ( b ( j ) = k | a ( j ) ; ϕ )
The database is trained using ( a ( 1 ) , b ( 1 ) ) , , ( a ( n ) , b ( n ) ) , b j 1 , 2 , , l . The probability of the classification of blood cells a as group k in the softmax layer is:
p ( b ( j ) = k | a ( j ) ; ϕ ) = e ϕ k T a ( j ) m = 1 l e ϕ m T a ( j )
The supervised learning approach is utilized for the network training to learn. The internal demonstration replicates the likeness between the training samples. The image attributes are visualized by averaging the patches of the images, which are interrelated by the neurons with a stochastic response in an upper layer. In the classification section, there are two layers, namely, the dense layer and dropout layer. The dense layer is also termed as the fully connected layer, which consists of various neurons or units, whereas the last dense layer consists of several neurons, similar to the number of kinds. After the completion of every dense layer, the activation layer is additionally added. The activation function is employed for the last dense layer output, which is entirely dissimilar to that employed for another dense layer in which the sigmoid or SoftMax function is normally used. A SoftMax layer is used in the multi-class classification process to allocate the probabilities of the decimal to every kind, and the target kind may have the probability of a high value. The SoftMax of the jth output unit is numerically evaluated by the following equation.
b ^ j = e a j j M e a j f o r j = 1 , 2 , 3 , , M
From the above equation, a j indicates the output of the jth dimension. The number of dimensions is represented by M , which is equal to the category numbers, and the probability linked with the jth category is indicated as b ^ j . After the prediction is made, the sample is allocated to the kind which has the probability of a high value as follows:
b ^ j = max j [ 1 , M ] b ^ j
The sigmoid function is employed in the tasks of binary classification. It receives the values of any range of numbers and returns the value that falls in the interval of [0, 1]. This sigmoid function is expressed by the following equation:
S i g m o i d ( x ) = 1 1 + e a
Dropout layers are the regularization methods implemented only in the network training to forestall it from the problem of overfitting by dropping the subset of the entered neurons and their links momentarily from the last dense layer. The dense layers are normally pursued by the dropout layer, apart from the last dense layer, which generates the kind-particular probabilities. Here, a ResNet model is used as a pre-trained model for the classification of the CNN. The accuracy of the classifier is improved by using the hybrid optimization dynamic algorithm.

3.4. Hybrid Optimization Dynamic (HOD) Algorithm

The HOD combines the sailfish optimizer algorithm and the seagull optimization algorithm. The HOD algorithm (see Figure 2) is formed by using both the algorithms. In the sailfish optimizer algorithm, the elitism operation is replaced by the migration operation in the seagull optimization algorithm because the elitism contains the copy of the unaltered fittest solution for the next generation. However, in the seagull optimization algorithm, the migration operation is utilized for finding the fittest solution [25]. In the process of migration, the seagulls are moved as groups. The beginning locations of the seagulls are dissimilar to avoid collisions with each other. In a particular group, the seagulls move in the direction of the optimal seagull.

3.4.1. Sailfish Optimizer Algorithm

The major motivation of the sailfish optimizer algorithm is described in this part. Hence, the proposed algorithm as well as the numerical description is deliberated as follows.

Initialization

The sailfish optimizer algorithm represents the population-based metaheuristic algorithm. In this method, the sailfish is assumed as the candidate solutions, and the location of the sailfish is represented as variables in the search space. Therefore, the population over the solution space is arbitrarily created. The selfish search in one, two, three, or hyper dimensional spaces via their variable location vectors. In the e-dimensional search space, the jth member at the lth search contains the present location S f j , l j = 1 , 2 , , n . The matrix sailfish is regarded for saving the location of the entire sailfish. Hence, the locations represent the variants of the entire solution through the process of optimization.
S f l o c a t i o n = S f 1 , 1 S f 1 , 2 S f 1 , e S f 2 , 1 S f 2 , 2 S f 2 , e S f n , 1 S f n , 2 S f n , e
where n represents the sailfish numbers, e represents the variable numbers, and S f j , k represents the value of the kth dimension of the jth sailfish. Additionally, the fitness of every sailfish is evaluated via the computation of the fitness function as follows:
S a i l f i s h F i t n e s s V a l u e = F s a i l f i s h = F S f 1 , S f 2 , , S f n
Every sailfish is computed using the following matrix that describes the fitness value for the entire solution:
S f f i t n e s s = F ( S f 1 , 1 S f 1 , 2 S f 1 , e ) F ( S f 2 , 1 S f 2 , 2 S f 2 , e ) F ( S f n , 1 S f n , 2 S f n , e ) = F S f 1 F S f 1 F S f n
where n represents the sailfish numbers, S f j , k represents the value of the kth dimension of the jth sailfish, F computes the fitness function, and S f f i t n e s s stores the fitness value, which returns the fitness value for every sailfish. The first row of the S f l o c a t i o n matrix is transmitted to the fitness function, and the output represents the fitness value of the respective sailfish in the S f f i t n e s s matrix.
The sardine group is another important incorporator in the sailfish optimizer algorithm. It is presumed that the sardine group is swimming in the search space. Hence, the sardine location and its fitness values are employed as follows.
S l o c a t i o n = S 1 , 1 S 1 , 2 S 1 , e S 2 , 1 S 2 , 2 S 2 , e S m , 1 S m , 2 S m , e
where m represents the sardine numbers, S j , k represents the value of the kth dimension of jth sardine, and the S l o c a t i o n matrix represents the location of the entire sardines.
S f i t n e s s = F ( S 1 , 1 S 1 , 2 S 1 , e ) F ( S 2 , 1 S 2 , 2 S 2 , e ) F ( S m , 1 S m , 2 S m , e ) = F S 1 F S 1 F S m
where m represents the sardine numbers, S j , k describes the value of the kth dimension of jth sardine, F represents the objective function, and S f i t n e s s keeps the fitness value of every sardine. It is prominent that the sailfish as well as the sardines are equivalent factors to compute the solutions. In this method, the sailfish represents the major parameter which is distributed in the search space and sardines cooperate to compute the optimal location in this region. Actually, the sardine is eaten by the sailfish while searching the search space, and the sailfish updates the location to compute the optimal solution achieved up to that point.

Migration

The migration is otherwise called the exploration of seagull modeling. The seagull simulates the seagull group moving towards one location. The seagull swarm movement is scientifically modeled for the method of exploration. There are three rules followed here.

Collision Avoidance

Collision among the neighbors is neglected; the supplementary variable X is utilized for the computation of the location of the new search agent.
Z r = X × Q s ( a )
where Z r indicates the location of the search agent that does not collide with the other search agent, Q s indicates the present location of the search agent, a represents the present iteration, and X indicates the movement performance of the search agent in the obtained search space.
X = F z a × ( F z / M i t r w h e r e : a = 0 , 1 , 2 , , M i t r
where F z is established for controlling the frequency of utilizing variant X , which is linearly reduced from F z to 0. The F z value is set to 2 for this work.

Moving towards the Direction of the Optimum Seagull

Once the collision among the neighbors is completed, the search agents move in the direction of the optimum neighbor. This activity is completed by satisfying other rules described below.
N r = Y × Q y r ( a ) Q r ( a )
where N r indicates the locations of the search agent Q r to the optimum fit search agent Q y r . The behavior of Y is arbitrative; that is, it is in charge for the appropriate balancing among the exploitation as well as the exploration. The formula for Y is expressed as
Y = 2 × X 2 × s e
where s e represents the arbitrary number in the interval of [0, 1].

Sustaining Close to the Shortest Distance to the Optimal Search Agent

The search agent updates their location, which is modeled as follows:
E r = Z r + N r
where E r indicates the distance among the search agent and the optimal fit search agent.

Attack-Interchange Scheme

The sailfish frequently attacks the prey when any of the neighbors are attacked. Sometimes, the sailfish encourages the success rate by the temporarily synchronized attack. The sailfish chases as well as herds its prey. The herding manner of the sailfish changes its location with respect to the position of the other hunters around the prey school, devoid of direct synchronization among them. Consequently, the sailfish update their location inside the sphere around the optimal solution. In the sailfish process, at the jth iteration, the novel location of the sailfish A n e w _ S f j is updated as follows:
A n e w _ S f j = A e l i t e _ S f j η j × r a n d ( 0 , 1 ) × X n e w S f j + X i n j u r e d S f j 2 X o l d S f j
where A e l i t e _ S f j represents the location of the elite sailfish established up to now, X i n j u r e d _ S represents the optimum location of the injured sardine established up to now, X o l d _ S f represents the present location of the sailfish, r a n d ( 0 , 1 ) represents the arbitrary number among 0 and 1, and η j represents the coefficient at the jth iteration, which is created as follows:
η j = 2 × r a n d ( 0 , 1 ) × P d P d
where P d represents the prey density that describes the prey number at every iteration. Due to the reduction in the prey number through the group hunting by sailfish, the factor P d represents the important factor for the sailfish location update around the prey school. The adaptive expression for this factor is as follows:
P d = 1 M S f M S f + M S
where M S f and M S represents the sailfish numbers and the sardine numbers in every cycle of the algorithm. Additionally, because the primary sardine number is bigger than the sailfish, M S f is described as M S × P p , in which P p indicates the percentage of the sardine population which establishes the primary sailfish population. With respect to the average distance among the location of the present optimal sailfish and the present optimal sardine, the location of the sailfish is updated in the iteration course. Using this scheme, the auspicious region of the search space is saved. The sailfish obtain the various places around the school by altering the value of η . With respect to Equation (33), the variation interval of η is in the range of −1 and 1, but it is based on the prey number. Otherwise, by reducing P d , the amount of η is nearer to −1 or 1 according to r a n d ( 0 , 1 ) in Equation (33). The factor η is leaning towards 1 when r a n d ( 0 , 1 ) > 0.5 ; it tends towards −1 when r a n d ( 0 , 1 ) > 0.5 , and it is zero for r a n d ( 0 , 1 ) = 0.5 . The fluctuation of η and the location of the sailfish is updated from each other and convergence around the prey schools.

Hunting as Well as Catching Prey

At the start of the group hunting, the entire slaughter of the sardines is hardly examined. In most of the cases, sardine scales are eradicated while the sardine bill hits the body of the sardine. This causes the huge sardine number in the schools that contain pronounced injuries on their bodies. At the start of the hunt, the sailfish contain more energy to capture the prey and the sardines are no longer injured and tired. Hence, the sardines maintain a high escape speed and contain a high capability to move. To imitate this procedure, every sardine is gratified for updating the location regarding the present optimal location of the sailfish as well as the power of the attack at every iteration. In the sailfish algorithm, at the jth iteration, the new location of the sardine A n e w _ S j is expressed as
A n e w _ S j = s × A e l i t e _ S f j A o d d S j + A P
where A e l i t e _ S f j represents the optimal location of the elite sailfish established up to now, A o d d S j represents the present location of the sardine, s represents arbitrary numbers among 0 between 1, and A P describes the attack power of the sailfish number at every iteration, which is created as follows:
A P = X × 1 ( 2 × I t e × α )
where X and α represent the coefficients for reducing the value of the power attack straight from X to 0. To see the consequences of utilizing Equations (35) and (36), they give a few of the probable positions of sardines once slashing of the prey school is finished. Once the sailfish attacks, sardines escape to various places suddenly; then, the sardines update their location to modify the predator and a decrease in the risk is established with respect to s and A P factors. Actually, the sailfish power attack intensity is reduced, which helps the search agent convergence. By utilizing the parameter of A P , the sardine number updates its location, and the variable numbers are computed as follows:
ε = M S × A P
ϕ = e j × A P
where e j represents the variable number at the jth iteration and M S represents the sardine numbers in every cycle of the algorithm. With respect to the factor of A P , when the sailfish tap intensity is low, then ε sardines by ϕ sardine variables are updated. However, when the sailfish tap intensity is high, the locations of all the sardines are updated. Mostly, A P and s factors help sailfish optimization to display the arbitrary behavior of the best local stagnation in the entire iterations. The sailfish location is substituted by the recent location of the hunted sardine to increase the chance of hunting new prey, and it is expressed as follows:
A S f j = A S j   i f   F ( S j ) < F ( S f j )
where A S j represents the present location of the sardine at the jth iteration and A S f j represents the present location of the sailfish at the jth iteration. In each iteration, the location of every sailfish is updated regarding the elite sailfish and injured sardines. The updating of the location of the sardines is realized using chosen elite sailfish and sardines, which depends upon the sailfish attack power. As the procedure of updating the location of the sardines as well as the sailfish is completed, it is computed by the objective function. The location of the elite sailfish and the injured sardines is updated in every cycle of the algorithm. Then, the hunted sardines are eradicated from the population. These processes are updated iteratively until the termination criterion is gratified.

4. Results and Discussions

The proposed approach is verified effectively using 270 pairs of source images. The entire sample of source images was gathered from the Whole Brain Atlas of Harvard Medical School. The examinations were conducted using a set of images that contains CT, MRI, SPECT, MRI, and PET images. The database images are depicted in Figure 3. All the source images contain an identical spatial resolution of 512 × 512 pixels by 256 gray scale levels. The proposed HOD-CNN is computed in MATLAB 2018a with the system requirements being a i7 processor and 8 GB RAM. For the investigation purposes, the database was classified into 75:25 for testing as well as training purposes. Table 2 describes the parameters of the proposed algorithm.
The parameters of various techniques used to tune the proposed method are represented in Table 2.

4.1. Performance Measures

The computation of image fusion quality is classified into subjective computation and the objective computations. The performance measures are to choose the appropriate indices to compute the effect of a human visual scheme on image quality perception. The performance of the various approaches is computed based on measures such as edge information retention (QAB/F), average gradient (AG), standard deviation (SD), mutual information (MI), entropy (EN), spatial frequency (SF), and fusion factor (FF). Edge information retention exemplifies the transfer amount of edge detail information in the input images inserted into the fused image. Average gradient is utilized to characterize the image sharpness; if the value of average gradient is large, then the image is clear. Standard deviation describes the reflection of the dispersion degree of the pixel value and also the mean value of the image. If the standard deviation value is greater, then the image quality is better. Mutual information is utilized to compute the information of the fused image present in the utilized image. Entropy exemplifies the amount of information accessible in the source image as well as the fused image. Spatial frequency represents the entire action of the image in the space domain, and the size is proportional to the consequences of the image fusion. Fusion factor represents the well-identified performance measures that describe the strength of the fusion procedure.
Figure 3. Database image (i) CT and MRI, (ii) MRI and PET, and (iii) MRI and SPECT.
Figure 3. Database image (i) CT and MRI, (ii) MRI and PET, and (iii) MRI and SPECT.
Electronics 11 02124 g003
Figure 4 portrays the graphical analysis to determine the average running time for the proposed approach and various other techniques, namely, NSCT, Kaur et al. (2021);particle swarm optimization (PSO), Shehanaz et al. (2021); convolutional neural network (CNN), Li et al. (2021); and adolescent identity search (AIS) algorithm, Jose et al. (2021). The graph is plotted for the running time and various approaches. From the experimentation, the evaluation results revealed that the proposed approach attains a minimum average running time of about 0.53 s compared with other approaches.

4.2. Quantitative Analysis

The fused images for the three dataset images are shown in Figure 5, Figure 6 and Figure 7. In this part, the proposed approach is compared with the existing approaches such as NSST-PAPCNN [20], EMBO-DST [12], DNN-DGSA [14], and PCNN-NSST [15] by regarding the parameters such as edge information retention (QAB/F), average gradient (AG), standard deviation (SD), mutual information (MI), entropy (EN), spatial frequency (SF), and fusion factor (FF). Figure 4 describes the fusion results of CT and MRI. Figure 5 describes the fusion results of CT and PET images. Figure 6 and Figure 7 describe the fusion results of the CT and PET images as well as the CT and SPECT images. From the comparative analysis, the results reveal that the proposed approach attains better performances than with other approaches. Table 3, Table 4 and Table 5 describe the objective computation of various methods on medical image fusion for CT and MRI, CT and PET images, and CT and SPECT images. The results (Table 6) show that the proposed approach has better results when compared with other approaches.

5. Conclusions

In this paper, fused multimodality medical image classification is proposed depending upon a CNN with HOD. The major role of this method is optimal fusion using the hybrid optimization dynamic algorithm. Initially, multimodal medical images are obtained for the fusion process using modified discrete wavelet transform (MDWT). The fused image is classified into malignant or benign using a convolutional neural network (CNN). The HOD is utilized for enhancing the classification accuracy of the CNN algorithm. The HOD contains the sailfish optimizer algorithm and seagull optimization algorithm. The seagull optimizer algorithm replaces the migration operation to obtain the optimal location. Experimental analysis is carried out and compared with the other approaches with respect to performance measures such as edge information retention (QAB/F), average gradient (AG), standard deviation (SD), mutual information (MI), entropy (EN), spatial frequency (SF), and fusion factor (FF). The proposed approach is compared with other approaches on the databases and it is revealed that the proposed approach produced improved results. The experimental results show that the proposed approach performs better than other approaches and offers high quality fused images for an accurate diagnosis. In the future, the proposed approach has to be implemented in real-time applications and employed for other kinds of multimodality medical image fusion such asmulti-focus image fusion and infrared visible.

Author Contributions

Conceptualization, M.M.A.; Data curation, A.M.A.; Methodology, M.M.A.; Software, A.M.A.; Supervision, A.M.A.; Writing—original draft, M.M.A.; Writing—review & editing, M.M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tawfik, N.; Elnemr, H.A.; Fakhr, M.; Dessouky, M.I.; Abd El-Samie, F.E. Survey study of multimodality medical image fusion methods. Multimed. Tools Appl. 2021, 80, 6369–6396. [Google Scholar] [CrossRef]
  2. Kumar, P.; Diwakar, M. A novel approach for multimodality medical image fusion over secure environment. Trans. Emerg. Telecommun. Technol. 2021, 32, e3985. [Google Scholar] [CrossRef]
  3. Tang, L.; Tian, C.; Li, L.; Hu, B.; Yu, W.; Xu, K. Perceptual quality assessment for multimodal medical image fusion. Signal Process. Image Commun. 2020, 85, 115852. [Google Scholar] [CrossRef]
  4. Li, W.; Lin, Q.; Wang, K.; Cai, K. Improving medical image fusion method using fuzzy entropy and non-subsampling contourlet transform. Int. J. Imaging Syst. Technol. 2021, 31, 204–214. [Google Scholar] [CrossRef]
  5. Ullah, H.; Ullah, B.; Wu, L.; Abdalla, F.Y.; Ren, G.; Zhao, Y. Multimodality medical images fusion based on local-features fuzzy sets and novel sum-modified-Laplacian in non-subsampled shearlet transform domain. Biomed. Signal Process. Control 2020, 57, 101724. [Google Scholar] [CrossRef]
  6. Fu, J.; Li, W.; Du, J.; Xiao, B. Multimodal medical image fusion via laplacian pyramid and convolutional neural network reconstruction with local gradient energy strategy. Comput. Biol. Med. 2020, 126, 104048. [Google Scholar] [CrossRef] [PubMed]
  7. Li, X.; Zhao, J. A novel multi-modal medical image fusion algorithm. J. Ambient Intell. Humaniz. Comput. 2021, 12, 1995–2002. [Google Scholar] [CrossRef]
  8. Singh, S.; Gupta, D. Multistage multimodal medical image fusion model using feature-adaptive pulse coupled neural network. Int. J. Imaging Syst. Technol. 2020, 31, 981–1001. [Google Scholar] [CrossRef]
  9. Huang, B.; Yang, F.; Yin, M.; Mo, X.; Zhong, C. A review of multimodal medical image fusion techniques. Comput. Math. Methods Med. 2020, 2020, 8279342. [Google Scholar] [CrossRef] [Green Version]
  10. Tirupal, T.; Mohan, B.C.; Kumar, S.S. Multimodal medical image fusion techniques—A review. Curr. Signal Transduct. Ther. 2020, 15, 142–163. [Google Scholar] [CrossRef]
  11. Yadav, S.P.; Yadav, S. Image fusion using hybrid methods in multimodality medical images. Med. Biol. Eng. Comput. 2020, 58, 669–687. [Google Scholar] [CrossRef]
  12. Subbiah Parvathy, V.; Pothiraj, S.; Sampson, J. A novel approach in multimodality medical image fusion using optimal shearlet and deep learning. Int. J. Imaging Syst. Technol. 2020, 30, 847–859. [Google Scholar] [CrossRef]
  13. Wang, K.; Zheng, M.; Wei, H.; Qi, G.; Li, Y. Multi-modality medical image fusion using convolutional neural network and contrast pyramid. Sensors 2020, 20, 2169. [Google Scholar] [CrossRef] [Green Version]
  14. Parvathy, V.S.; Pothiraj, S.; Sampson, J. Optimal Deep Neural Network model-based multimodality fused medical image classification. Phys. Commun. 2020, 41, 101119. [Google Scholar] [CrossRef]
  15. Tan, W.; Tiwari, P.; Pandey, H.M.; Moreira, C.; Jaiswal, A.K. Multimodal medical image fusion algorithm in the era of big data. Neural Comput. Appl. 2020, 1–21. [Google Scholar] [CrossRef]
  16. Li, X.; Guo, X.; Han, P.; Wang, X.; Li, H.; Luo, T. Laplacian redecomposition for multimodal medical image fusion. IEEE Trans. Instrum. Meas. 2020, 69, 6880–6890. [Google Scholar] [CrossRef]
  17. Arif, M.; Wang, G. Fast curvelet transform through genetic algorithm for multimodal medical image fusion. Soft Comput. 2020, 24, 1815–1836. [Google Scholar] [CrossRef]
  18. Kaur, M.; Singh, D. Multi-modality medical image fusion technique using multi-objective differential evolution based deep neural networks. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 2483–2493. [Google Scholar] [CrossRef]
  19. Hu, Q.; Hu, S.; Zhang, F. Multi-modality medical image fusion based on separable dictionary learning and Gabor filtering. Signal Process. Image Commun. 2020, 83, 115758. [Google Scholar] [CrossRef]
  20. Xia, J.; Lu, Y.; Tan, L. Research of multimodal medical image fusion based on parameter-adaptive pulse-coupled neural network and convolutional sparse representation. Comput. Math. Methods Med. 2020, 2020, 3290136. [Google Scholar] [CrossRef]
  21. Shehanaz, S.; Daniel, E.; Guntur, S.R.; Satrasupalli, S. Optimum weighted multimodal medical image fusion using particle swarm optimization. Optik 2021, 231, 166413. [Google Scholar] [CrossRef]
  22. Dinh, P.H. Multi-modal medical image fusion based on equilibrium optimizer algorithm and local energy functions. Appl. Intell. 2021, 51, 8416–8431. [Google Scholar] [CrossRef]
  23. Dinh, P.H. A novel approach based on three-scale image decomposition and marine predators algorithm for multi-modal medical image fusion. Biomed. Signal Process. Control 2021, 67, 102536. [Google Scholar] [CrossRef]
  24. Lu, Y.; Yi, S.; Zeng, N.; Liu, Y.; Zhang, Y. Identification of rice diseases using deep convolutional neural networks. Neurocomputing 2017, 267, 378–384. [Google Scholar] [CrossRef]
  25. Dhiman, G.; Kumar, V. Seagull optimization algorithm: Theory and its applications for large-scale industrial engineering problems. Knowl.-Based Syst. 2019, 165, 169–196. [Google Scholar] [CrossRef]
Figure 1. Block schematic of the proposed multimodal medical image fusion.
Figure 1. Block schematic of the proposed multimodal medical image fusion.
Electronics 11 02124 g001
Figure 2. Flowchart of HOD algorithm.
Figure 2. Flowchart of HOD algorithm.
Electronics 11 02124 g002
Figure 4. Average running time analysis.
Figure 4. Average running time analysis.
Electronics 11 02124 g004
Figure 5. Fusion results of CT and MRI.
Figure 5. Fusion results of CT and MRI.
Electronics 11 02124 g005
Figure 6. Fusion results of CT and PET images.
Figure 6. Fusion results of CT and PET images.
Electronics 11 02124 g006
Figure 7. Fusion results of CT and SPECT images.
Figure 7. Fusion results of CT and SPECT images.
Electronics 11 02124 g007
Table 1. Summary of related works regarding multimodal medical image fusion.
Table 1. Summary of related works regarding multimodal medical image fusion.
AuthorsFusion SchemesModalityDatasetsMetricsCons
Yadav et al. [11]Hybrid DWT-PCAMRI, PET, SPECT, CTOnline repository datasetsEN, SD, RMSE, and PSNRLow image quality, performance was not consistent so low efficiency
Subbiah et al. [12]EMBO-DST with RBM modelMRI, PET, SPECT, CTFour sets of database images (represented as D1, D2, D3, and D4)SD, EQ, MI, FF, EN, CF and SFImplementation was complex
Wang et al. [13]CNNMRI, CT, MRI, T1, T2, PET, and SPECTOnline eight fused imagesTE, AB/F, MI, and VIFDifficult to fuse infrared-visible and multi-focus image fusion.
Parvathy et al. [14]DNN with DGSACT, SPECT, and MRIFour datasets (I, II, III and IV)Fusion factor and spatial frequencyFailed to execute in real-time applications
Tan et al. [15]PCNN-NSSTMRI, PET, and SPECT100 pairs of multimodal medical images from the Whole Brain Atlas datasetEntropy (EN), standard deviation (SD), normalized mutual information (NMI), Piella’s structure similarity (SS), and visual information fidelity (VIF)Fused image quality was poor
Li et al. [16]Laplacian re-decomposition method (LRD)MRI, PET, and SPECT20 pairs of multimodal medical images collected from Harvard University medical libraryStandard deviation (STD), mutual information (MI), universal quality index (UQI), and tone-mapped image quality index (TMQI)Difficult to propose more rapid and active methods of medical image enhancement and fusion
Kaur et al. [18]NSCTMRI, CTMulti-modality biomedical images dataset is obtained from Ullah et al. (2020) [5]Fusion factor, fusion symmetry, mutual information, edge strengthDifficult to fuse the remote sensing images
Hu et al. [19]Analytic separable dictionary learning (ASeDiL) method in NSCT domainCT and MRI127 groups of brain anatomical images from the Whole Brain Atlas medical image databasePiella–Heijmans’ similarity based metric QE, spatial frequency (SF), universal image quality index (UIQI), and mutual informationTime consumption was more
Xia et al. [20]Parameter-adaptive pulse-coupled neural network (PAPCNN) methodCT, MRI, T1, T2, PET, and SPECTDatabase from the Whole Brain Atlas of Harvard Medical School [13] and the Cancer Imaging Archive (TCIA)Entropy (EN), edge information retention (QAB/F), mutual information (MI), average gradient (AG), space frequency (SF), and standard deviation (SD)Implementation was complex
Shehanaz et al. [21]Optimum weighted average fusion (OWAF) with particle swarm algorithm (PSO)MR-CT, MR-SPECT, and MR-PETBrain images (http://www.med.harvard.edu/AANLIB/) accessed on 18 January 2022Standard deviation (STD), mutual information (MI), universal quality index (UQI),Required more computational time to perform the task
Dinh et al. [22]SLE-PCO with EOA MRI-PET medical imageshttp://www.med.harvard.edu/AANLIB/ accessed on 18 January 2022SD, EQ, MI, FF, EN, CF, and SFComputational complexity was high
Dinh et al. [23]FRKCO and MPAMRI-PET medical imageshttp://www.med.harvard.edu/AANLIB/ accessed on 18 January 2022Standard deviation (STD), mutual information (MI), universal quality index (UQI), and tone-mapped image quality index (TMQI)Information entropy was low
Table 2. Parameter settings.
Table 2. Parameter settings.
TechniquesParametersRanges
Convolutional neural network Kernel size 7 × 7
Learning rate 0.001
Batch size32
OptimizerAdam
Dropout rate0.5
Sailfish optimization algorithmInitial population30
Total iteration100
Fluctuation range−1 and 1
Random number 0 and 1
Seagull optimization algorithm Population size100
Maximum iterations200
Control parameter[2, 0]
Frequency control parameter2
Table 3. Objective computation of various methods on medical image fusion for CT and MRI.
Table 3. Objective computation of various methods on medical image fusion for CT and MRI.
MeasuresPCNN-NSSTDNN-DGSAEMBO-DSTNSST-PAPCNNProposed
QAB/F0.20820.22840.26530.36450.4672
AG6.31286.67546.98167.65427.8914
SD47.276148.169250.761651.672354.6870
MI2.68922.76542.89743.16783.4152
EN4.42644.52984.67844.85324.9952
SF19.975720.786521.673823.876124.7622
FF5.98246.09356.13826.56658.3281
Table 4. Objective computation of various methods on medical image fusion for CT and PET images.
Table 4. Objective computation of various methods on medical image fusion for CT and PET images.
MeasuresPCNN-NSSTDNN-DGSAEMBO-DSTNSST-PAPCNNProposed
QAB/F0.31860.34840.39410.44570.5392
AG6.13266.83317.98148.06428.4631
SD46.148849.912651.867453.564855.4872
MI2.98273.26733.89154.21864.6524
EN4.51484.75284.99215.08855.1872
SF22.890725.873328.915430.737232.8245
FF6.14296.46346.87367.19847.3562
Table 5. Objective computation of various methods on medical image fusion for CT and SPECT images.
Table 5. Objective computation of various methods on medical image fusion for CT and SPECT images.
MeasuresPCNN-NSSTDNN-DGSAEMBO-DSTNSST-PAPCNNProposed
QAB/F0.41770.45750.58840.64290.7362
AG7.76487.97318.15838.67588.8763
SD50.873452.761554.832456.752258.6421
MI4.25414.46594.98265.09425.1644
EN4.43284.67154.82595.27815.4638
SF27.571429.876530.981632.762534.8712
FF6.98767.69787.85738.25388.7642
Table 6. Evaluation of fusion results.
Table 6. Evaluation of fusion results.
MethodsAverage GradientFusion FactorStandard Deviation
DWT6.53427.034650.4563
Shearlet7.68597.278553.098
Contourlet7.82197.865455.5231
MDWT8.27318.045757.4563
Hybrid MDWT-Shearlet8.91428.801259.7314
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Almasri, M.M.; Alajlan, A.M. Artificial Intelligence-Based Multimodal Medical Image Fusion Using Hybrid S2 Optimal CNN. Electronics 2022, 11, 2124. https://doi.org/10.3390/electronics11142124

AMA Style

Almasri MM, Alajlan AM. Artificial Intelligence-Based Multimodal Medical Image Fusion Using Hybrid S2 Optimal CNN. Electronics. 2022; 11(14):2124. https://doi.org/10.3390/electronics11142124

Chicago/Turabian Style

Almasri, Marwah Mohammad, and Abrar Mohammed Alajlan. 2022. "Artificial Intelligence-Based Multimodal Medical Image Fusion Using Hybrid S2 Optimal CNN" Electronics 11, no. 14: 2124. https://doi.org/10.3390/electronics11142124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop