Optimized Convolutional Fusion for Multimodal Neuroimaging in Alzheimer’s Disease Diagnosis: Enhancing Data Integration and Feature Extraction

Multimodal neuroimaging has gained traction in Alzheimer’s Disease (AD) diagnosis by integrating information from multiple imaging modalities to enhance classification accuracy. However, effectively handling heterogeneous data sources and overcoming the challenges posed by multiscale transform methods remains a significant hurdle. This article proposes a novel approach to address these challenges. To harness the power of diverse neuroimaging data, we employ a strategy that leverages optimized convolution techniques. These optimizations include varying kernel sizes and the incorporation of instance normalization, both of which play crucial roles in feature extraction from magnetic resonance imaging (MRI) and positron emission tomography (PET) images. Specifically, varying kernel sizes allow us to adapt the receptive field to different image characteristics, enhancing the model’s ability to capture relevant information. Furthermore, we employ transposed convolution, which increases spatial resolution of feature maps, and it is optimized with varying kernel sizes and instance normalization. This heightened resolution facilitates the alignment and integration of data from disparate MRI and PET data. The use of larger kernels and strides in transposed convolution expands the receptive field, enabling the model to capture essential cross-modal relationships. Instance normalization, applied to each modality during the fusion process, mitigates potential biases stemming from differences in intensity, contrast, or scale between modalities. This enhancement contributes to improved model performance by reducing complexity and ensuring robust fusion. The performance of the proposed fusion method is assessed on three distinct neuroimaging datasets, which include: Alzheimer’s Disease Neuroimaging Initiative (ADNI), consisting of 50 participants each at various stages of AD for both MRI and PET (Cognitive Normal, AD, and Early Mild Cognitive); Open Access Series of Imaging Studies (OASIS), consisting of 50 participants each at various stages of AD for both MRI and PET (Cognitive Normal, Mild Dementia, Very Mild Dementia); and whole-brain atlas neuroimaging (AANLIB) (consisting of 50 participants each at various stages of AD for both MRI and PET (Cognitive Normal, AD). To evaluate the quality of the fused images generated via our method, we employ a comprehensive set of evaluation metrics, including Structural Similarity Index Measurement (SSIM), which assesses the structural similarity between two images; Peak Signal-to-Noise Ratio (PSNR), which measures how closely the generated image resembles the ground truth; Entropy (E), which assesses the amount of information preserved or lost during fusion; the Feature Similarity Indexing Method (FSIM), which assesses the structural and feature similarities between two images; and Edge-Based Similarity (EBS), which measures the similarity of edges between the fused and ground truth images. The obtained fused image is further evaluated using a Mobile Vision Transformer. In the classification of AD vs. Cognitive Normal, the model achieved an accuracy of 99.00%, specificity of 99.00%, and sensitivity of 98.44% on the AANLIB dataset.


Introduction
Alzheimer's Disease (AD) is a neurodegenerative disorder characterized by progressive cognitive decline and memory impairment.Early and accurate diagnosis of AD is crucial for effective intervention and treatment planning.Patients with AD prolong to dementia and lose physiological functions, eventually leading to death.An estimated 55 million people worldwide have dementia, and more than 60% of them live in low-and middle-income countries [1].It is anticipated that this will increase to 78 million by 2030 and rise to 139 million by 2050 [1].Metabolic changes in the brain and significant atrophy contribute to the neurodegenerative processes observed in AD.Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) have become very useful for studying the structural and functional changes linked to Alzheimer's Disease in the neuroimaging field [2].
The pathogenic nature of Alzheimer's disease manifests in the brain as structural alterations, including anatomical location, cortical thickness, volumetry, and other morphological features [3].The capacity to quantify these morphological traits using MRI has resulted in an explosion of methodological research in predicting and categorizing AD.MR images show exceptional anatomical information, and hippocampus shrinkage evaluated on a high-resolution T1-weight MRI is an important criterion for the clinical diagnosis of Alzheimer's disease [4].For example, volumetric features of sMRI data were utilized for the classification of Early Mild Cognitive Impairment (EMCI) vs. Normal Cognitive (NC) [5].sMRI cortical thickness and its underlying geometric information were employed for the early detection of AD [6].The most useful spatial features of GM were extracted from sMRI and further segmented into ninety regions for Late Mild Cognitive Impairment (LMCI) vs. EMCI classification [7].Extracted gray matter (GM) images from sMRI using CNN architecture were used for the diagnosis and classification of the CN, EMCI, and LMCI groups [8].While structural imaging captures downstream pathological changes, it is not appropriate for reflecting changes that precede protein deposition [9].PET imaging with 18F-fluorodeoxyglucose (FDGPET) imaging can capture brain metabolism characteristics to aid in the detection of lesions for AD classification [10].For example, FDGPET data was used for the automated classification of AD groups [11].The risk of AD was predicted based on the deep learning model by extracting FDG PET image features [12].The fusion of various imaging modalities in multimodal neuroimaging holds the promise of offering comprehensive insights into the metabolic and structural changes occurring in AD [9,13,14].
In recent years, there has been a growing interest in leveraging multimodal neuroimaging data to enhance the accuracy of AD classification.Combining information from multiple imaging modalities can provide a more comprehensive understanding of AD [7,[15][16][17] in capturing complementary aspects of brain alterations that may not be evident in a single modality.However, effectively integrating these heterogeneous data sources presents a considerable challenge.
A rising number of studies have used MRI and PET data to discover multilevel and multimodal properties by translating regional brain images into higher-level, more compact characteristics.For example, to classify AD, a new composite image was generated by blending the Gray Matter (GM) tissue region of the brain in both MRI and FDG-PET images using mask and registration coding techniques [18].Likewise, researchers have classified individuals with AD according to their GM density and glucose utilization from MRI and PET, allowing for a more comprehensive and accurate diagnosis of AD [19].Furthermore, a novel multimodal image-fusion technique designed to merge PET and MRI data was introduced, in which the extracted features are subsequently input into an ensemble classifier [17].While the automatic pipeline method described in their study utilized techniques such as Free Surfer and affine registration for pixel-level fusion, achieving precise alignment and ensuring that the combined information accurately reflects the underlying neurobiological changes was a challenge.Furthermore, the extraction of relevant features from these fused modalities introduced complexities in terms of feature selection and interpretability.Although the study employed a range of techniques, including ANOVA, scalar methods, and machine learning classifiers, to identify prominent features, the process of discerning which specific features contribute most significantly to the accurate classification of AD stages remained challenging.The three-channel phase feature learning model demonstrated promise in integrating and learning latent representations from multimodal neuroimaging data, even in the presence of data heterogeneity [10].However, the successful partial resolution of the heterogeneity issue highlighted the complexity of reconciling distinct characteristics of PET and MRI data within the unified framework.
The multimodality latent space-inducing ensemble Support Vector Machine classifier demonstrated the potential for improved AD classification accuracy.The study's exploration of an ensemble SVM classifier that induced a latent space via multimodality MRI and PET inputs gave a promising avenue for enhancing AD classification accuracy.However, the intricacy lay in the requirement of effectively reconciling and encapsulating the inherent interrelationships present in MRI and PET modalities within this latent space [20].An image fusion method to combine MRI and PET images into a composite Gray Matter-PET modality for AD diagnosis has been proposed [21].While the image fusion method demonstrated superior overall performance compared to unimodal methods, it was noted that its performance in terms of sensitivity and specificity sometimes fell short of optimal levels.The fusion process might introduce subtle distortions or uncertainties that affect diagnostic accuracy.
Additionally, researchers have explored the use of multiscale transform approaches to integrate information from multiple modalities consisting of MRI and PET imaging data in the field of multimodal neuroimaging for the diagnosis of AD.Wavelet-based fusion was used to bring together information from the MRI and PET scans to improve spatial resolution and yielded an image with metabolic and anatomical detail coupled with the finest resolution.Despite the coregistration and alignment process, variability in image resolution and information content between MRI and PET was still a challenge [22].Ensuring accurate registration and fusion requires overcoming differences in image resolution, contrast, and acquisition protocols [23].The computational complexity of processing and analyzing data across multiple dimensions is a further disadvantage of multiscale transform methods.These methods frequently involve intricate mathematical algorithms and intensive computational operations, making them time-consuming and resource-intensive [24].In contrast to multilevel feature learning and multiscale transform approaches for multimodal neuroimaging fusion, our proposed method leverages pre-trained convolution neural networks.Consequently, it requires neither a specific dataset for training nor a specialized network.The proposed network is fed with source images, and optimized feature maps are extracted at layer 1 [25]; Maximum Fusion (MF) is used to fuse the extracted feature.Vision transformers have garnered significant attention in the field of computer vision due to their remarkable performance in image classification tasks, demonstrating their ability to capture long-range dependencies within images [26].Simple ViT models trained faster and better than the original [27]; the performance of ViTs saturated faster when scaled to be deeper and improved image classification accuracy [28], and Mobile ViT allowed for light-weight global processing of information with transformers [29].Leveraging the success of vision transformers, we employ fused images as inputs to train a vision transformer model for AD classification.
The research paper's primary contributions are as follows: (1) An MF strategy is designed to fuse the same depth of feature maps of MRI and PET.(2) An optimized convolution technique including variations in kernel size to increase receptive field and the addition of instance normalization, which ensured no bias towards the modality, is designed to improve alignment and integration of MRI and PET data.(3) A novel fusion model of MRI and PET images using MS strategy and optimized convolution network is proposed, which overcomes the shortcomings of multiscale transform fusion methods and enhances the receptive field of the network.

Materials and Methods
The Laplacian transform, a mathematical technique for accentuating intricate image details and extracting edge and texture characteristics [30,31], emerges as a pivotal tool in the fusion of MRI and PET images.In this study, MRI and FDG-PET images are collected from three databases (ADNI, OASIS, and AANLIB), with the Laplacian transform demonstrating its significance in efficiently revealing finer aspects of the images, thereby contributing to the improved fusion of MRI and PET modalities.This technique's potential is well-founded in its ability to capture and highlight subtle features within medical images, aligning with the demands of contemporary image fusion methodologies [32].This paper uses Laplace sharpening to obtain the fine details of the image.Transposition convolution was optimized by varying the kernel size, and instance normalization is used to extract relevant feature maps from MRI and PET images.The feature maps obtained via MRI and PET are fused using MF to emphasize the strongest activations between the MRI and PET modalities.The proposed fusion model is shown in Figure 1. which overcomes the shortcomings of multiscale transform fusion methods and enhan the receptive field of the network.

Materials and Methods
The Laplacian transform, a mathematical technique for accentuating intricate im details and extracting edge and texture characteristics [30,31], emerges as a pivotal too the fusion of MRI and PET images.In this study, MRI and FDG-PET images are collec from three databases (ADNI, OASIS, and AANLIB), with the Laplacian transform dem strating its significance in efficiently revealing finer aspects of the images, thereby cont uting to the improved fusion of MRI and PET modalities.This technique's potentia well-founded in its ability to capture and highlight subtle features within medical imag aligning with the demands of contemporary image fusion methodologies [32].This pa uses Laplace sharpening to obtain the fine details of the image.Transposition convolut was optimized by varying the kernel size, and instance normalization is used to extr relevant feature maps from MRI and PET images.The feature maps obtained via MRI a PET are fused using MF to emphasize the strongest activations between the MRI and P modalities.The proposed fusion model is shown in Figure 1.

Datasets
This study utilized MRI and PET images obtained from the official website of H vard University (http://www.med.harvard.edu/AANLIB/home.html), the ADNI web (https://adni.loni.usc.edu), and the OASIS website.The brain images under considerat are categorized into two distinct stages, namely Cognitive Normal (CN) and Alzheim Disease (AD).50 images of each of the stages are downloaded from each website, mak a total of 300 images.PET images are in red, green, and blue (RGB) while MRI images in black and white.

Laplace Sharpening
Fine details are obtained from the source image Sn (n = 1, which means MRI image; n = 2, which means PET image).The Sn with pixel intensities is represented as Sn(x, y, c), where (x, y) are the coordinates and c is the color channel [30].A Gaussian blur kernel is applied to the source image Sn(x, y, c).This is represented by Equation (1): The blurred image is then obtained by convolving the source image with the Gaussian blur kernel.This process is expressed in Equation ( 2): The obtained image (blurred image) after Equation ( 1) is applied to Si, as is represented in Equation ( 2).
where: i and j iterate over the 3 × 3 neighborhood of the pixel at position (x, y, c).
To obtain the finer details, a Laplacian kernel, given by Equation ( 3), is utilized.
By convolving the Laplacian kernel from Equation ( 3) with the previously obtained blurred image, we generate the Laplacian image.This process is outlined in Equation ( 4): The sharpened image is derived by combining the source image Sn (x,y,c) with the Laplacian-filtered image.This combination is controlled by an enhancement factor 'k'.The resulting equation is given by: Lastly, to ensure that pixel values are within the valid range of [0, 1], the sharpened image is clipped using the following equation: Sampled sharpened images from the AANLIB database are shown in Figure 5.

Laplace Sharpening
Fine details are obtained from the source image  ( = 1, which means MR age;  = 2 , which means PET image).The  with pixel intensities is represent (, , ), where (, ) are the coordinates and  is the color channel [30].A Gau blur kernel is applied to the source image (, , ).This is represented by Equatio The blurred image is then obtained by convolving the source image with the G ian blur kernel.This process is expressed in Equation ( 2): The obtained image (bl image) after Equation ( 1) is applied to Si, as is represented in Equation (2).
To obtain the finer details, a Laplacian kernel, given by Equation ( 3), is utilized By convolving the Laplacian kernel from Equation ( 3) with the previously obt blurred image, we generate the Laplacian image.This process is outlined in Equatio

Basic Image Feature Map Extraction Based on Optimized Transposition Convolution
The proposed method leverages the VGG19 network as a backbone for featu traction due to its exceptional performance in various computer vision tasks.Pre research in the field of image fusion has reported successful results using VGG19 fo age fusion tasks.For instance, infrared and visible images were integrated using V to extract relevant features, and the MF rule was utilized for the final fused image [34].Likewise, transposed convolution was used for upsampling in image resoluti enhance effective feature extraction [35].In our proposed study, the initial step inv applying transposed convolution after the first convolution layer [36] by varying k

Basic Image Feature Map Extraction Based on Optimized Transposition Convolution
The proposed method leverages the VGG19 network as a backbone for feature extraction due to its exceptional performance in various computer vision tasks.Previous research in the field of image fusion has reported successful results using VGG19 for image fusion tasks.For instance, infrared and visible images were integrated using VGG19 to extract relevant features, and the MF rule was utilized for the final fused image [33,34].Likewise, transposed convolution was used for upsampling in image resolution to enhance effective feature extraction [35].In our proposed study, the initial step involves applying transposed convolution after the first convolution layer [36] by varying kernel sizes.Traditional transposed convolution uses fixed-size kernels for upsampling, which might not capture all levels of detail effectively.By varying the kernel size, the convolution operation can adapt to different spatial scales present in the input image [37,38].This adaptability is particularly useful for retaining fine-grained information while ensuring that larger structures are also captured [39].Equation (7) gives the transposition operation on the input images.
In Equation ( 7), X can be either MRI or PET, and Equation ( 7) effectively captures the process of obtaining the feature maps from both images using the same convolutional operation with Kernel(m, n), which is kernel-centered at the position (m, n), Trans rep- resents the output of transposition process, while i and j represent the row and column indices of the output feature map, respectively.The variable k is the half-size of the kernel (kernel radius), indicating the distance from the center pixel which must be considered in the summation, and Input(i − m, j − n) refers to the pixel value of the input image at the relative position (i − m, j − n).
After the transpose convolution operation, instance normalization without learnable parameters is applied across the height and width dimensions to the feature maps corresponding to MRI and PET modalities generated at the higher resolution.Instance normalization without learnable parameters is a variant of instance normalization, in which the scaling and shifting factors are not learned but are instead fixed and applied in a predetermined manner [40,41].In this study, instance normalization helps normalize the activations of individual instances independently, without introducing any learnable parameters.Normalizing each modality ensures that the fusion process is not biased towards the modality due to differences in intensity, contrast, or scale.Also, since this instance of normalization does not require the learning of scaling and shifting parameters, it can lead to a reduction in model complexity and improve the performance of image fusion [42].
The instance normalization is represented as follows: Step 1: The mean and variance of the feature map Trans (i, j) across the height (H) and width (W) dimensions for each instance (t) are calculated separately.Let us denote them as µ_ti and σ_ti, respectively: Step 2: The feature map Trans(i, j) is normalized using the computed mean and variance in Equations ( 8) and (9) as follows: where y ijk is the normalized output at position (i, j) and channel (k), Trans(i + m, j + n) represents the pixel value of the feature map at the relative position (i + m, j + n), µ ti and ρ 2 ti are the mean and variance computed in step 1, and ε is a small constant added to the denominator to avoid division by zero.Note: the normalization is applied separately to both the MRI and PET feature maps to ensure that each feature map is normalized across its channel dimensions.

Basic Image Fusion Strategy Using the MS
After normalizing the feature maps separately for MRI and PET using instance normalization, MS is applied to combine the normalized feature maps.This strategy aims to leverage complementary information from both MRI and PET modalities.The basic idea is to take the maximum value between the corresponding elements of the normalized MRI and PET feature maps, pixel-wise, to create a fused feature map [43].
For each pixel position (i, j) and channel (k), the maximum value between the nor- malized MRI and PET feature maps is: This Maximum Strategy ensures that the fused feature map retains the strongest features from both the MRI and PET modalities, leveraging the strengths of each image while minimizing the impact of less relevant information.The fused feature map is further fed into a Vision Transformer for the classification of AD stages.

Experiment and Result Analysis
To verify the effectiveness and advancement of the proposed fusion method in different scenarios, we conducted a rigorous set of evaluations and experiments that leveraged its intrinsic adaptability and generalization capabilities.Notably, our approach capitalizes on the unique ability to fuse information from single data instances, rendering it suitable for scenarios where large training datasets might be unavailable or impractical to create.By focusing on single data instances without requiring prior training, we ensured that the method's performance was consistently assessed across three different databases.We adopted the same network size in [36] for the fusion process.To effectively assess its performance, we adopted a holistic approach based on quantitative metrics assessments.Quantitative analysis involves calculating relevant performance metrics, such as PSNR, SSIM, FSIM, E, and EBS, among others [44,45].These metrics allowed us to quantitatively measure the fidelity, preservation of salient features, and alignment with ground truth information.The PSNR values indicate the quality of the fused images in terms of noise and distortion, where higher values suggest better quality.The SSIM values indicate the structural similarity between the original and fused images, with values closer to 1 indicating better similarity.The metrics E, FSIM, and EBS provide insights into various aspects of image quality, such as edge preservation and structural information.These metrics assess the performance of the fusion strategy in different dimensions.At the same time, the proposed algorithm will be compared with the results of the other two typical fusion methods on the dataset from the selected databases.Sample fused images are displayed in Figure 6.
For each pixel position (, ) and channel (), the maximum value between th malized MRI and PET feature maps is: This Maximum Strategy ensures that the fused feature map retains the stronge tures from both the MRI and PET modalities, leveraging the strengths of each image minimizing the impact of less relevant information.The fused feature map is furth into a Vision Transformer for the classification of AD stages.

Experiment and Result Analysis
To verify the effectiveness and advancement of the proposed fusion method i ferent scenarios, we conducted a rigorous set of evaluations and experiments that aged its intrinsic adaptability and generalization capabilities.Notably, our approach italizes on the unique ability to fuse information from single data instances, render suitable for scenarios where large training datasets might be unavailable or impract create.By focusing on single data instances without requiring prior training, we en that the method's performance was consistently assessed across three different data We adopted the same network size in [36] for the fusion process.To effectively ass performance, we adopted a holistic approach based on quantitative metrics assessm Quantitative analysis involves calculating relevant performance metrics, such as P SSIM, FSIM, E, and EBS, among others [44,45].These metrics allowed us to quantita measure the fidelity, preservation of salient features, and alignment with ground information.The PSNR values indicate the quality of the fused images in terms of and distortion, where higher values suggest better quality.The SSIM values indica structural similarity between the original and fused images, with values closer to 1 cating better similarity.The metrics E, FSIM, and EBS provide insights into various as of image quality, such as edge preservation and structural information.These metr sess the performance of the fusion strategy in different dimensions.At the same tim proposed algorithm will be compared with the results of the other two typical f methods on the dataset from the selected databases.Sample fused images are disp in Figure 6.All experiments were conducted using the Python programming language an cuted on a GPU-accelerated system.Table 1 Table 2 Table 3 depict   All experiments were conducted using the Python programming language and executed on a GPU-accelerated system.Tables 1-3 depict the quantitative evaluation values of the fusion images corresponding to twelve typical source images from AD stages in the selected databases.The overall evaluation metrics evaluate the SSIM, FSIM, E, EBS, and PSNR of the proposed fusion model, and multiscale transform methods are depicted in Figures 7-9.To assess the generalization capability and performance of the fused data on individual modalities, MViT is trained and validated using the fused data and then tested with unseen fused data from ADNI and AANLIB.The MViT model is trained on the fused data from ADNI and AANLIB separately, i.e., 150 fused data from ADNI and 100 fused data from AANLIB, as shown in Table 4.This study leveraged the hyperparameters in [29] and finetuned them for the proposed model AD stage classification.Hyperparameters used have a learning rate of 0.0002 and a weight decay of 0.01.The stochastic gradient descent optimizer performed better than the adaptive optimizer with weight decay in our study.(e) (f)

Discussion
From Table 1, it can be observed that the proposed fusion strategy generally performs better compared with DWT and LPG in terms of most metrics across both datasets.This indicates that the proposed fusion method leveraging the VGG19 model tends to preserve more image details, maintain better structural similarity, and produce higher-quality fused images.Furthermore, the results demonstrate that the proposed fusion strategy exhibits effectiveness and advancement across AD and CN without the need for explicit model training.Likewise, the results in Table 2 also show consistency in the performance trend across different stages of AD from ADNI.This suggests the robustness and general applicability of the proposed fusion method across diverse datasets, showcasing its effectiveness for various image fusion scenarios.
Figures 7-9 show the proposed model achieves a PSNR of 32.55, a SIMM of 0.83, an E of 4.13, an FSIM of 0.96, and an EBS of 0.75 on the AANLIB dataset.Comparing these metrics with the other fusion models, the proposed model outperforms DWT and GLP across most metrics, indicating superior image quality and feature preservation.On the ADNI dataset, the proposed model achieves a PSNR of 30.39, a SIMM of 0.70, an E of 6.78, an FSIM of 0.93, and an EBS of 0.67.Again, the proposed model generally performs better than the other methods, showcasing its potential for accurately fusing MRI and PET data.The proposed model scores a PSNR of 28.58, a SIMM of 0.67, an E of 6.83, an FSIM of 0.85, and an EBS of 0.71 on the OASIS dataset.While still performing favorably, the differences in metrics between the models are less pronounced on this dataset.Across all three datasets, the proposed model consistently outperforms both DWT and GLP.This implies that the proposed fusion method captures more meaningful information from both MRI and PET modalities, resulting in images that are better suited for subsequent analysis or diagnosis.While DWT and GLP might have their advantages, the results indicate that the proposed model is more effective for the specific task of fusing MRI and PET data for AD vs. CN classification.
The clinical implication of these results lies in the ability of the image fusion model to aid in the classification of individuals with AD vs. those who are CN.Generally, higher performance metrics (like higher PSNR, SIMM, and FSIM values and lower E and EBS values) indicate better image fusion quality.This better fusion quality can potentially enhance the ability to detect patterns and features that are crucial for accurate classification.Figure 12 shows that the performance of the fused image in the classification of AD vs. CN using MRI test data from AANLIB provides a precision of 100% (AD) and 98% (CN), with a recall of 97% (AD), and 100% (CN), and an F1-score of 99% (AD), with 99% (CN) indicating that the model performs very well on the AANLIB MRI test dataset.It has high precision, recall, and F1-score values for both classes, which suggests that the model is effective in correctly classifying instances from both classes.The model performance using the AANLIB PET test dataset is like the AANLIB MRI dataset.The model's performance on the ADNI MRITEST dataset is good but not as high as the AANLIB datasets.The performance of the fused image in the classification of AD vs. CN using MRI test data from ADNI provides precision values of 91% (AD) and 100% (CN), recalls of 100% (AD) and 93% (CN), and F1-scores of 95% (AD) and 96% (CN).The model's performance on the ADNI PET test dataset is like that of the ADNI MRI test dataset.It achieves high precision and recall for both classes, but the recall for class AD is comparatively low (88%), affecting the overall F1-score for that class.
However, there are some variations in performance across datasets.The model performs slightly better on the AANLIB datasets compared with the ADNI datasets.Additionally, the performance metrics for class AD are relatively low compared with those for class CN in some cases, suggesting that the model might struggle more with the AD class.Figure 14 shows that the AUC values, which measure the model's ability to discriminate between positive and negative classes, are generally high.The AANLIB MRI and AANLIB PET datasets have the highest AUC values, suggesting excellent discriminative power.The ADNI MRITEST dataset has perfect AUC (100%) but slightly lower accuracy compared with the AANLIB datasets.Likewise, the OASIS MRI Test data have perfect AUC (100%).The ADNI PET dataset has a relatively low AUC (90%) compared with the other datasets, which might indicate that the model has slightly more difficulty distinguishing between classes.The high recall values indicate that the model is effective at identifying most samples of the positive class, which is crucial in medical diagnosis in order to avoid false negatives.The high AUC values imply that the model can effectively distinguish between different classes, which is vital for reliable diagnostic decisions.However, while the model's performance is promising, in clinical decisions, the model should be used as a supportive tool for medical professionals to aid in diagnosis and decision making.A summary of the proposed model performance is depicted in Table 5.The results achieved in this article are compared and validated with the recent research conducted on AD detection using multimodal neuroimaging in Table 5.
Table 6 presents a comparison between the proposed method and several existing methods for binary classification tasks in AD vs. CN classification using the ADNI database.The proposed method demonstrates competitive performance compared with existing methods.While some methods achieve higher accuracy, the proposed method maintains a good balance between specificity and sensitivity, which is crucial in medical diagnosis scenarios.Additionally, the proposed method's performance on the AANLIB dataset suggests its potential for generalizability across different datasets.

Conclusions
In this research study, we proposed a novel fusion method for multimodal neuroimaging data consisting of MRI and PET images to enhance the accuracy of Alzheimer's Disease classification.The method leveraged a Maximum Fusion strategy and an optimized convolution network, effectively combining the complementary information from both modalities.The fusion approach demonstrated its effectiveness across multiple datasets and AD stages without the need for explicit model training.Our comprehensive experiments and result analysis showcased the superior performance of the proposed fusion method compared with traditional fusion methods such as Discrete Wavelet Transform and Laplacian Pyramid Gaussian in terms of various quantitative metrics.The fused images were used to train the Mobile Vision Transformer for the classification of Alzheimer's Disease vs. Cognitive Normal classification, and the proposed model's classification accuracy, precision, recall, F1-score, and AUC values demonstrated its effectiveness.
Comparisons with existing methods highlighted the competitive nature of our proposed approach, maintaining a good balance between the specificity and sensitivity crucial for medical diagnosis scenarios.The fusion model's clinical implication lies in its potential to aid medical professionals in the accurate classification of AD vs. CN patients.One major limitation is the absence of real clinical data for validation, which could impact the model's performance when applied to real-world scenarios.Additionally, the proposed model's performance might be influenced by variations in data acquisition protocols and scanner characteristics in clinical settings.Future research could focus on validating the method using diverse clinical datasets and addressing the limitations observed in this study.

Figure 1 .
Figure 1.Framework of Proposed Fusion Method.

Figure 2 Figure 3 Figure 4
show sample the datasets used from AA LIB, ADNI, and OASIS database, respectively.

Figure 1 .
Figure 1.Framework of Proposed Fusion Method.

2. 1 .
DatasetsThis study utilized MRI and PET images obtained from the official website of Harvard University (http://www.med.harvard.edu/AANLIB/home.html(accessed on 15 September 2023)), the ADNI website (https://adni.loni.usc.edu(accessed on 15 September 2023)), and the OASIS website.The brain images under consideration are categorized into two distinct stages, namely Cognitive Normal (CN) and Alzheimer's Disease (AD).50 images of each of the stages are downloaded from each website, making a total of 300 images.PET images are in red, green, and blue (RGB) while MRI images are in black and white.Figures2-4show sample the datasets used from AANLIB, ADNI, and OASIS database, respectively.

Figure 2 .
Figure 2. Sample Images from the AANLIB database.

Figure 3 .
Figure 3. Sample Images from the ADNI database.

Figure 4 .
Figure 4. Sample Images from OASIS Database.

Figure 3 .
Figure 3. Sample Images from the ADNI database.

Figure 4 .
Figure 4. Sample Images from OASIS Database.

Figure 3 .
Figure 3. Sample Images from the ADNI database.

Figure 4 .
Figure 4. Sample Images from OASIS Database.Figure 4. Sample Images from OASIS Database.

Figure 4 .
Figure 4. Sample Images from OASIS Database.Figure 4. Sample Images from OASIS Database.

7 Figure 8 Figure 9 .
the quantitative uation values of the fusion images corresponding to twelve typical source images AD stages in the selected databases.The overall evaluation metrics evaluate the FSIM, E, EBS, and PSNR of the proposed fusion model, and multiscale transform me are depicted in Figure To assess the generalization capabilit

Figure 9 .
Figure 9. Average Metrics Value of OASIS.

Figure 10 .Figure 9 .
Figure 10.Training Accuracy, Validation Accuracy, Training Loss, and Validation Loss using Fused Data from AANLIB

Figure 9 .
Figure 9. Average Metrics Value of OASIS.

Figure 10 .Figure 10 .
Figure 10.Training Accuracy, Validation Accuracy, Training Loss, and Validation Loss using Fused Data from AANLIB Figure 10.Training Accuracy, Validation Accuracy, Training Loss, and Validation Loss using Fused Data from AANLIB.

Figure 11 .
Figure 11.Training Accuracy, Validation Accuracy, Training Loss, and Validation Loss using Fused Data from ADNI.

Figure 12 .Figure 11 .
Figure 12.Training Accuracy, Validation Accuracy, Training Loss, and Validation Loss using Fused Data from OASIS.

Figure 11 .
Figure 11.Training Accuracy, Validation Accuracy, Training Loss, and Validation Loss using Fused Data from ADNI.

Figure 12 .Figure 12 .
Figure 12.Training Accuracy, Validation Accuracy, Training Loss, and Validation Loss using Fused Data from OASIS.

Figure 11 .
Figure 11.Training Accuracy, Validation Accuracy, Training Loss, and Validation Loss using Fused Data from ADNI.

Figure 12 .Figure 13 .
Figure 12.Training Accuracy, Validation Accuracy, Training Loss, and Validation Loss using Fused Data from OASIS.

Table 1 .
Quantitative Evaluation Values of Fused Images from AANLIB Database.

Table 2 .
Quantitative Evaluation Values of Fused Images from the ADNI Database.

Table 3 .
Quantitative Evaluation Values of Fused Images from the OASIS Database.

Table 4 .
Details of Source Images, Fused Images, and Augmented Images.

Table 5 .
Summary of Proposed Model Performance.

Table 6 .
Comparison of Proposed Method with Existing Methods.