VBM-Based Alzheimer’s Disease Detection from the Region of Interest of T1 MRI with Supportive Gaussian Smoothing and a Bayesian Regularized Neural Network

: This paper presents an efﬁcient computer-aided diagnosis (CAD) approach for the automatic detection of Alzheimer’s disease in patients’ T1 MRI scans using the voxel-based morphometry (VBM) analysis of the region of interest (ROI) in the brain. The idea is to generate a normal distribution of feature vectors from ROIs then later use for classiﬁcation via Bayesian regularized neural network (BR-NN). The ﬁrst dataset consists of the magnetic resonance imaging (MRI) of 74 Alzheimer’s disease (AD), 42 mild cognitive impairment (MCI), and 74 control normal (CN) from the ADNI 1 dataset. The other dataset consists of the MRI of 42 Alzheimer’s disease dementia (ADD), 42 normal controls (NCs), and 39 MCI due to AD (mAD) from our GARD 2 database. We aim to create a generalized network to distinguish normal individuals (CN/NC) from dementia patients AD/ADD and MCI/mAD. Our performance relies on our feature extraction process and data smoothing process. Here the key process is to generate a Statistical Parametric Mapping (SPM) t -map image from VBM analysis and obtain the region of interest (ROI) that shows the optimistic result after two-sample t -tests for a smaller value of p < 0.001(AD vs. CN). The result was overwhelming for the distinction between AD/ADD and CN/NC, thus validating our idea for discriminative MRI features. Further, we compared our performance with other recent state-of-the-art methods, and it is comparatively better in many cases. We have experimented with two datasets to validate the process. To validate the network generalization, BR-NN is trained from 70% of the ADNI dataset and tested on 30% of the ADNI, 100% of the GARD dataset, and vice versa. Additionally, we identiﬁed the brain anatomical ROIs that may be relatively responsible for brain atrophy during the AD diagnosis.


Introduction
Alzheimer's disease is a neurodegenerative disorder causing dementia, which frequently affects elderly people. The disease mainly affects the brain and its vital parts. In this disease, the brain is affected in various regions causing a significant change in the structural and functional region of the brain [1,2]. With the breakthrough of various imaging techniques such as MRI, positron emission tomography (PET), and computed tomography (CT) scans in the medical sector, numerous efforts have been conducted to process, simulate, and interpret the results for computer-aided diagnosis (CAD) that will be crucial for medical professionals. Similarly, various research using structural and functional MRI as the primary biomarker has been performed to efficiently develop a CAD system for Alzheimer's disease diagnosis and detection. MRI is a magnetic-field gradient-based neuroimaging biomarker technique that provides anatomic and physiological information for the diagnosis of different parts of the body including the brain. It uses a strong magnetic field and radio-wave to generate a higher-quality picture of the structure and volume of the brain, called structural MRI (sMRI). Although sMRI does not show the functional activity of the brain, it can reflect the content of brain tissue that can be useful to detect the abnormalities in the brain of AD patients compared to the healthy ones. These discriminative features lie in the content of white matter or ventricles, with a change in cortical thickness, hippocampus shape, and brain volume, which can be the probable structural features to detect AD. On group comparison, the voxel intensities of obtained clusters after the t-test amplifies these differences, eventually generating ROIs for feature extraction.
In neuroimaging, VBM [3] is used to investigate the anatomic morphology of the brain and its related part using the intensity value of every voxel present in the MRI scan. It is generally used with different statistical parametric mapping approaches to analyze tissue contents between two or more groups of patients under different hypothesis tests. VBM extracts features of three tissue distributions: gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) in each voxel considered at a time. It follows the pipeline as executed by John et al. [3] and was later followed by Savio et al. [4], Busatto et al. [5], Frisoni et al. [6], and Beheshti et al. [7,8] for Alzheimer's prone structural MRI classification. Savio et al. [4] applied VBM in the Open Access Series of Imaging Studies (OASIS) dataset to detect AD patients from the NC and myotonic dystrophy of type 1 (MD1) patients from healthy controls (HCs) from the dataset of the Neurology Department of the Donostia hospital. The extracted features were voxels values and mean and standard deviation (MSD) from SPM t-map images. These features were classified using machine-learning tools such as Support vector machine (SVM) and have achieved the highest accuracy of 81%. Beheshti et al. [7] applied a similar technique to detect 130 AD and 130 HCs, extracting the voxel value from voxels of interest (VOI) using a probability distribution function (PDF) for feature reduction (selection), which is later classified using SVM and RBF kernels. They used the ADNI dataset and reported up to 90% accuracy from ten-fold cross-validation. Subsequently, Beheshti et al. [8] again published a paper with higher accuracy of up to 92.48% using seven feature ranking methods for the same VOI-based features. On the other hand, non-Morphometry based feature extraction technique using classical methods such as Dual-Tree Complex Wavelet Transform (DTCWT) [9], wavelet entropy [10], Singular Value Decomposition (SVD) [11], Scaled Chaotic Artificial Bee Colony (SCABC) [12], Downsized Kernel Principal Component Analysis (DKPCA) [13] including machine-learning and deep-learning techniques [14][15][16] like Recurrent Neural Network and Semantic Segmentation [17,18] have been used in recent times in CAD development for AD detection between NC and MCI. Also, major advances in structural and functional neuroimaging studies have resulted in improvements in the early and accurate detection of AD [19][20][21][22][23][24].
In this study, we investigated the VBM method using SPM 12 for MRI classification. We had combined the traditional VBM pipeline [3] along with our proposed feature extraction process to support the classification of the normalized feature vectors. Following the VBM execution pipeline, we obtained a t-map image showing the areas where distinction exists between two groups (CN and AD/MCI), used in the t-test. The obtained t-map is used to identify ROIs to generate the features for classification. To support the classifier with a normal distribution of features belonging to each group, we implemented Gaussian smoothing to generate weighted average feature vectors without any feature reduction. Subsequently, the smoothened features are fed into the neural network along with their label and trained based on Bayesian optimization. Once the network is trained, it is tested on other sets of data, and finally, the results are evaluated.

Image Acquisition
The T1-MR images and data used in this work were obtained from the Alzheimer's disease neuroimaging initiative (ADNI) database [25]. The selected sMRI were acquired at 1.5 T using a Siemens scanner and preprocessed using the University of California, San Diego (UCSD) ADNI pipeline in two steps for a three-dimensional (3D) Grad-warp [26,27] and B1 non-uniformity correction [28]. It is important to consider the work of Clifford et al. [29] who studied the approaches for standardization of ADNI MRI protocols and acquisition parameters for each imaging sequence, post-acquisition correction of image artifacts, and other technical non-uniformity for data variation minimization. The obtained MRI scans were in Nifti (.nii format) 3D volume, which can be processed for VBM analysis as illustrated in Figure 1. label and trained based on Bayesian optimization. Once the network is trained, it is tested on other sets of data, and finally, the results are evaluated.

Image Acquisition
The T1-MR images and data used in this work were obtained from the Alzheimer's disease neuroimaging initiative (ADNI) database [25]. The selected sMRI were acquired at 1.5 T using a Siemens scanner and preprocessed using the University of California, San Diego (UCSD) ADNI pipeline in two steps for a three-dimensional (3D) Grad-warp [26,27] and B1 non-uniformity correction [28]. It is important to consider the work of Clifford et al. [29] who studied the approaches for standardization of ADNI MRI protocols and acquisition parameters for each imaging sequence, post-acquisition correction of image artifacts, and other technical non-uniformity for data variation minimization. The obtained MRI scans were in Nifti (.nii format) 3D volume, which can be processed for VBM analysis as illustrated in Figure 1.

Participants
The first dataset for our study is from ADNI that consists of 74 AD, 74 CN, and 40 Mild MCI patients' scans. The training and testing sets were selectively adapted from Cuingnet et al. [2]. We have selected a limited number of AD and NC, as the available MCI subjects are comparatively less, so to balance the classwise approximation of the classifier, we have used limited subjects in both datasets for comparative study. We have selected

Participants
The first dataset for our study is from ADNI that consists of 74 AD, 74 CN, and 40 Mild MCI patients' scans. The training and testing sets were selectively adapted from Cuingnet et al. [2]. We have selected a limited number of AD and NC, as the available MCI subjects are comparatively less, so to balance the classwise approximation of the classifier, we have used limited subjects in both datasets for comparative study. We have selected one MRI from each patient having a uniform dimension of 256 × 256 × 166 (to avoid a co-registration problem). The second dataset is from GARD that consists of 42 ADD, 42 NC, and 39 mAD. T1 weighted MRI scans are preferred in both datasets for the experiment as most of the VBM based work uses T1 scans and it has been reported that the contrast-enhanced T1-weighted MRI was significantly superior to T2-weighted for imaging assessment without statistically significant difference [30]. The detailed demographic is shown in Table 1.

Methodology
We will discuss the proposed method thoroughly following each step involved. Figure 2 provides a quick illustration of the proposed method. one MRI from each patient having a uniform dimension of 256 × 256 × 166 (to avoid a coregistration problem). The second dataset is from GARD that consists of 42 ADD, 42 NC, and 39 mAD. T1 weighted MRI scans are preferred in both datasets for the experiment as most of the VBM based work uses T1 scans and it has been reported that the contrastenhanced T1-weighted MRI was significantly superior to T2-weighted for imaging assessment without statistically significant difference [30]. The detailed demographic is shown in Table 1.

Methodology
We will discuss the proposed method thoroughly following each step involved.

VBM SPM Model
All MRI scans undergo the preprocessing process as in 3.1.1 required to make all MRI scans ready for SPM modeling. We have used a GUI-based SPM toolbox for SPM model generation. 3.1.1 to 3.1.2 is implemented manually using the SPM toolbox as shown in Figure 3.
All MRI scans undergo the preprocessing process as in 3.1.1 required to make all MRI scans ready for SPM modeling. We have used a GUI-based SPM toolbox for SPM model generation. 3.1.1 to 3.1.2 is implemented manually using the SPM toolbox as shown in

MRI Preprocessing for Voxel-Based Model
The major process in VBM includes: (i) spatial normalization and diffeomorphic anatomical registration through exponentiated lie algebra (DARTEL) registration, (ii) modulation and segmentation, and (iii) smoothing. Spatial normalization transforms all the participants' volume to the same stereotactic space for uniformity. The SPM 12 MATLAB toolbox is used for spatial registration. It uses the DARTEL [31] registration template which was created from 555 participants of the IXI dataset [3] between 20-80 years. We used tissue probability maps (TPM) [32] for the initial spatial registration and segmentation as a reference map. The final image was smoothened using the Gaussian smoothing kernel size [8 8 8] to suppress noise and effects due to the residual difference in functional and gyral-related anatomies during inter-participant averaging. Hence, the normalizedmodulated-segmented smooth image of voxel 1.5 mm and dimensions 121 × 145 × 121 was finally formed for each tissue volume, i.e., GM, WM, and CSF. We have considered the GM and WM volume as the major input for the further mapping process.

SPM t-Map
Normalized DARTEL-wrapped, modulated smooth GM and WM images were separately used to build a general linear model (GLM) to detect gray matter (or white matter) volume changes performing a voxel-wise two-sample t-test (one group being CN and the other being MCI or AD) in SPM12 using Equation (1): Here, is the predicted functional value for the th participant and (design matrix) is the vector of regressor variables for the th participant. For the matrix test, is set [1 0] for group 1 and [0 1] for group 2, during two-sample t-tests.
is a vector of parameters that varies from voxel to voxel and must be estimated after the model is built. In the two-sample t-tests, it represents the mean contrast of each group including covariates value as well. is a normally distributed error term with a mean of zero and a variance of unit S.D and varies across voxels. The total intracranial volume (TIV) was engaged in the design matrix as a covariate with an absolute threshold at a value of 0.2. This model

MRI Preprocessing for Voxel-Based Model
The major process in VBM includes: (i) spatial normalization and diffeomorphic anatomical registration through exponentiated lie algebra (DARTEL) registration, (ii) modulation and segmentation, and (iii) smoothing. Spatial normalization transforms all the participants' volume to the same stereotactic space for uniformity. The SPM 12 MATLAB toolbox is used for spatial registration. It uses the DARTEL [31] registration template which was created from 555 participants of the IXI dataset [3] between 20-80 years. We used tissue probability maps (TPM) [32] for the initial spatial registration and segmentation as a reference map. The final image was smoothened using the Gaussian smoothing kernel size [8 8 8] to suppress noise and effects due to the residual difference in functional and gyral-related anatomies during inter-participant averaging. Hence, the normalizedmodulated-segmented smooth image of voxel 1.5 mm and dimensions 121 × 145 × 121 was finally formed for each tissue volume, i.e., GM, WM, and CSF. We have considered the GM and WM volume as the major input for the further mapping process.

SPM t-Map
Normalized DARTEL-wrapped, modulated smooth GM and WM images were separately used to build a general linear model (GLM) to detect gray matter (or white matter) volume changes performing a voxel-wise two-sample t-test (one group being CN and the other being MCI or AD) in SPM12 using Equation (1): Here, Y i is the predicted functional value for the ith participant and X i (design matrix) is the vector of regressor variables for the ith participant. For the matrix test, X i is set [1 0] for group 1 and [0 1] for group 2, during two-sample t-tests. β is a vector of parameters that varies from voxel to voxel and must be estimated after the model is built. In the two-sample t-tests, it represents the mean contrast of each group including covariates value as well. ε i is a normally distributed error term with a mean of zero and a variance of unit S.D and varies across voxels. The total intracranial volume (TIV) was engaged in the design matrix as a covariate with an absolute threshold at a value of 0.2. This model was implemented using p < 0.001 and Gaussian random field (GRF) theory [33] for family-wise error (FWE) correction to produce a final t-map image. Here, the p-value indicates the probability of the null hypothesis being true; therefore, the lower the p-value, the better is the model. Once SPM t-map image was generated, we could concentrate ROI selection on those areas with GM volumes change detected by a voxel-based analysis using contrast as a feature as in Figures 3 and 4, for further feature extractions. We have considered the selection and generation of the ROI as the most crucial and result-oriented process. If the selected ROI is not chosen properly, an unfavorable result may be obtained.
This model was implemented using p < 0.001 and Gaussian random field (GRF) theory [33] for family-wise error (FWE) correction to produce a final t-map image. Here, the pvalue indicates the probability of the null hypothesis being true; therefore, the lower the p-value, the better is the model. Once SPM t-map image was generated, we could concentrate ROI selection on those areas with GM volumes change detected by a voxelbased analysis using contrast as a feature as in Figures 3 and 4, for further feature extractions. We have considered the selection and generation of the ROI as the most crucial and result-oriented process. If the selected ROI is not chosen properly, an unfavorable result may be obtained.

ROI Selection
It is done after SPM modeling i.e., Sections 3.1.2., 3.2.1 and 3.2.2 describe our proposed method for ROI selection. It is done manually as described in the respective sections. Corresponding results using screenshots are shown in Figures 3 and 4, respectively. Section 3.3 is performed via MarsBaR (also SPM toolbox extension) and manual selection. Each section explains in detail how the process is done and what parameters were selected for the final implementation.

From Each Cluster Generated into Separate ROI
These ROIs are identified based on the t-map generated from VBM analysis. The regions which are significantly different and bundled into the same region are considered as a cluster, as shown in Figures 4-6. The generated clusters vary in shape and size, based on the significance of the p-values and the error correction used; p-value was computed based on a repetitive experiment in the range 0.1 to 0.0001, the one with the highest voxel number is reported i.e., p< 0.001. Thus, we have selected for p < 0.001 where 14 clusters were selected out of 30 clusters from the t-map of GM volumes, and one cluster from the t-map of the WM volumes both from the AD and CN two-sample t-tests. Similarly, three clusters were generated from the t-map of the GM volumes of the CN and MCI twosample t-tests. The selected 18 clusters were used as ROI directly to obtain the mean weighted voxel value from each participating MRI; we will term this ROI as ROI_CL hereinafter. The number of voxels in each ROI_CL differs abruptly so, if we accumulate all the voxels from each ROI, the number of voxels is going to be substantially large; hence one of the logical solutions would be taking mean weighted voxel value as stated in Equation (2) to represent a single value for each selected ROI for each MRI, demonstrated as shown in Appendix A, Table A1.

ROI Selection
It is done after SPM modeling i.e., Sections 3.1.2, 3.2.1 and 3.2.2 describe our proposed method for ROI selection. It is done manually as described in the respective sections. Corresponding results using screenshots are shown in Figures 3 and 4, respectively. Section 3.3 is performed via MarsBaR (also SPM toolbox extension) and manual selection. Each section explains in detail how the process is done and what parameters were selected for the final implementation.

From Each Cluster Generated into Separate ROI
These ROIs are identified based on the t-map generated from VBM analysis. The regions which are significantly different and bundled into the same region are considered as a cluster, as shown in Figures 4-6. The generated clusters vary in shape and size, based on the significance of the p-values and the error correction used; p-value was computed based on a repetitive experiment in the range 0.1 to 0.0001, the one with the highest voxel number is reported i.e., p< 0.001. Thus, we have selected for p < 0.001 where 14 clusters were selected out of 30 clusters from the t-map of GM volumes, and one cluster from the t-map of the WM volumes both from the AD and CN two-sample t-tests. Similarly, three clusters were generated from the t-map of the GM volumes of the CN and MCI two-sample t-tests. The selected 18 clusters were used as ROI directly to obtain the mean weighted voxel value from each participating MRI; we will term this ROI as ROI_CL hereinafter. The number of voxels in each ROI_CL differs abruptly so, if we accumulate all the voxels from each ROI, the number of voxels is going to be substantially large; hence one of the logical solutions would be taking mean weighted voxel value as stated in Equation (2)

From AAL Template Based on the Anatomic Region
Each participant's brain differs in terms of the structure and size of the anatomic regions. This can be more problematic when anatomical ROIs are defined from a single participant and applied to the remaining data, as a significant anatomical inconsistency exists between the participants. Hence, we selected anatomic regions with a higher number of voxels in each cluster. After obtaining those regions, we generated a mask from the automated anatomic labeling (AAL) [34] atlas for those anatomic regions and coregistered with our t-map image. The AAL atlas is a digital human brain atlas with 116 labelled volumes indicating macroscopic brain structures. Hence, these masks can be used subsequently for feature extraction. The most significant regions of the brain based on the t-map image using AAL atlas labels are tabulated in Tables 2 and 3 for the ADNI and the GARD test conditions, respectively. From the tables, 14 anatomic regions were selected, and the standard ROIs were generated from the AAL atlas, as shown in Figure 7. We will term this ROI as ROI_AAL hereinafter. We applied the same weight-based mean calculation to obtain the mean features to represent each ROI value from all MRI, as in ROI_CL, a few demonstrated as in shown in the Appendix A, Table A2.

From AAL Template Based on the Anatomic Region
Each participant's brain differs in terms of the structure and size of the anatomic regions. This can be more problematic when anatomical ROIs are defined from a single participant and applied to the remaining data, as a significant anatomical inconsistency exists between the participants. Hence, we selected anatomic regions with a higher number of voxels in each cluster. After obtaining those regions, we generated a mask from the automated anatomic labeling (AAL) [34] atlas for those anatomic regions and co-registered with our t-map image. The AAL atlas is a digital human brain atlas with 116 labelled volumes indicating macroscopic brain structures. Hence, these masks can be used subsequently for feature extraction. The most significant regions of the brain based on the t-map image using AAL atlas labels are tabulated in Tables 2 and 3 for the ADNI and the GARD test conditions, respectively. From the tables, 14 anatomic regions were selected, and the standard ROIs were generated from the AAL atlas, as shown in Figure 7. We will term this ROI as ROI_AAL hereinafter. We applied the same weight-based mean calculation to obtain the mean features to represent each ROI value from all MRI, as in ROI_CL, a few demonstrated as in shown in the Appendix A, Table A2.

Feature Extraction from Both Types of ROIs
Once the mask is generated from both ROI_CL and ROI_AAL, they are processed further for ROI analysis where each mask fuses with every image of each patient to generate only the important voxels. This voxel is higher in dimensions and is difficult to use as a feature. Consequently, the feature dimensionality is reduced with their intensity values averaged to obtain a mean intensity value. For each participant's MRI, voxels that are within the ROI are averaged to obtain a single representative value. The larger the ROI area, the higher is this value as we have used the weighted mean as in Equation (2), which becomes biased in the highly activated non-binary ROI. Although the ROI selection is a general and typical process, it can be a pivotal factor for obtaining good results. An efficient Appl. Sci. 2021, 11, 6175 9 of 22 SPM extension tool, MARSeille Boîte À Région d'Intérêt (MarsBaR) [35] was used for this feature extraction process: Here, x i represents the voxel intensity value for n voxels lying inside the ROI and w i is the weight of the voxel. Here, w i will have high values representing high confidence for voxels within the ROI, and w i values are nearly zero representing low confidence in outlier voxels.
For VBM-based ROI, the selection was done manually based on cluster size, ROI_CL, for which we selected the cluster with the highest number of voxels, e.g., AD vs. CN voxels result from the ADNI dataset t-test yielded 30 clusters, of which we selected 14 clusters with significant voxel numbers, while rest small-sized clusters were neglected (please refer to the Supplementary Data file ADNI_AD_CN_p0.001_FWE_BrainLabels). Similarly, the remaining four clusters were selected from the AD vs. CN WM t-test and CN vs. MCI t-test (please refer to the Supplementary Data file ADNI_CN_MCI_p0.001_FWE_BrainLabels). For ROI_AAL the selected clusters were matched with the AAL template, to obtain significant anatomic regions, then those standard ROIs were used to create the mask for further voxel detection from each ROI, for feature detection. On initial matching 23 ROI from AD vs. CN (16 GM and seven WM), nine ROIs from AD vs. MCI, and 11 ROIs from CN vs. MCI were detected. We selected the one with the highest number of voxels from each cluster and the overlapped regions. In total there were 18 ROIs from the cluster and 14 ROIs from the AAL atlas, as shown in Figures 6 and 7, which were tested separately to obtain the best classification result.

Gaussian Smoothing
The extracted features from each participant are the weighted mean intensity value of each ROI, which must be more homogeneous to their class and heterogeneous to others. However, upon careful observation as in Figure 8, the original data (blue) contains some highly unusual spikes at some point and can be considered abnormal. These spikes (x = 70, 110, 120, 144) were considered abnormal for the graph due to its extreme shifting from mean positioning (however this doesn't mean they are a less useful feature, each feature values depends on its cluster size, only in regard to the distribution we consider it as abnormal, which were averaged using Gaussian smoothing (GS) for accurate classification using BR-NN). In general, the Gaussian distribution for one-dimensional vector operates under the following equation: where x i represents the input value for the ith feature of each image for each ROI, and σ i,w represents the standard deviation for each group of ROI, w is the size of the window over which the Gaussian weighted moving average is calculated. This window slides over the specified length, and provides the average Gaussian value over that window, as shown in Figure 8. With a Gaussian weighted moving window, data is transformed to its normal distribution after the completion of its window operation. Please note the window is an averaging moving function hence σ i,w also keeps on changing however we can approximate the final distribution to be a normal distribution for higher window size. In our case, we have chosen w = 9 from our experiment. In Figure 8, the right part separated from x = 1 to 74 represents the mean value for the AD patients and x = 75 to x = 148 represents the mean value of CN patients from one cluster of ROI_CL. Here, GS transforms the data from unknown haphazard distribution to the normal distribution, so that it can be classified with less computational burden using the Bayesian regularized neural network. It shifted the characteristics of each vector toward the mean value of its target class and bring homogeneity in its cohort. This can be considered as a data cleaning step similar to the normalization process. Furthermore, it can work as a replacement for the feature selection step where the features are transformed for better classification results.
case, we have chosen = 9 from our experiment. In Figure 8, the right part separated from x = 1 to 74 represents the mean value for the AD patients and x = 75 to x = 148 represents the mean value of CN patients from one cluster of ROI_CL. Here, GS transforms the data from unknown haphazard distribution to the normal distribution, so that it can be classified with less computational burden using the Bayesian regularized neural network. It shifted the characteristics of each vector toward the mean value of its target class and bring homogeneity in its cohort. This can be considered as a data cleaning step similar to the normalization process. Furthermore, it can work as a replacement for the feature selection step where the features are transformed for better classification results. Figure 8. Data processing result before and after Gaussian-weighted smoothing average filter, window size, shown for 14th out of 18 features from ROI_CL. Blue colored curve is the raw data whereas orange and red curve represents the smoothened data for = 9 and = 5 respectively.

Bayesian Regularized Feed-Forward Neural Network (BR-FNN)
We used the Bayesian regularization [36] proposed by MacKay in 1992 for training the NN. This regularization technique minimizes the linear combination of squared errors and weights so that, at the end of the training the resulting network has good generalization qualities. Later in 1997 Foresee and Hagan [37] proposed optimization using the Levenberg-Marquardt (LM) algorithm in the BR-FNN to reduce computational overhead. It reduces the linear arrangement of squared errors and weights and finally produces a good, generalized network after iterative training. The Bayesian regularization takes place within the Levenberg-Marquardt algorithm. This can be summarized using the following equations.
For training the sets of the form { 1 , 1 }, { 2 , 2 }… { , }, where { , } represents the input value and the corresponding target for the ith term. The sum of squared error during training is represented by: 2 (4) Figure 8. Data processing result before and after Gaussian-weighted smoothing average filter, window size, shown for 14th out of 18 features from ROI_CL. Blue colored curve is the raw data whereas orange and red curve represents the smoothened data for w = 9 and w = 5 respectively.

Bayesian Regularized Feed-Forward Neural Network (BR-FNN)
We used the Bayesian regularization [36] proposed by MacKay in 1992 for training the NN. This regularization technique minimizes the linear combination of squared errors and weights so that, at the end of the training the resulting network has good generalization qualities. Later in 1997 Foresee and Hagan [37] proposed optimization using the Levenberg-Marquardt (LM) algorithm in the BR-FNN to reduce computational overhead. It reduces the linear arrangement of squared errors and weights and finally produces a good, generalized network after iterative training. The Bayesian regularization takes place within the Levenberg-Marquardt algorithm. This can be summarized using the following equations.
For training the sets of the form {x 1 , t 1 }, {x 2 , t 2 } . . . {x n , t n }, where {x i , t i } represents the input value and the corresponding target for the ith term. The sum of squared error during training is represented by: Here, t i are the target values and y i is the predicted response by the neural network during the process of training for n number of training inputs.
The objective function for Bayesian optimization and regulation exhibits a standard form as: Here, E W is the sum of squares of the network weights and the E D error calculated from (4). Optimization parameters α and β are calculated as in Equations (6) and (7), respectively.
Initially, α = 0 and β = 1 such that F(w) = E D , where γ is the effective number of parameters, calculated using the Gauss-Newton approximation to the Hessian matrix available in the LM training algorithm until final convergence is met [37,38]. We first randomly initialized a 2-layer feed-forward NN with the inputs connected to the first hidden layers and the second layer is the output layers with only two output values for binary classification. Here, the Nguyen-Widrow [39] method is used to initialize the weights and bias values for each hidden layer so that the active regions of the layer's neurons are distributed almost evenly over the input space. Hyperbolic tangent sigmoid [40] is used as an activation function in hidden layers to generate squashed outputs between [−1 1] for input vectors whereas in the output layer it is a linear purelin function. Random partition of training and testing (7:3) was performed manually on the whole set of features so that only the partitioned training set participates in training BR-NN, while the test set remains untouched. Later once the BR-NN model is trained, the test set is fed into the NN to find out the testing accuracy.
To obtain the optimal number of hidden layers, the network was trained for a different number of hidden layers starting from 1 to 20, initialized each time randomly. The classification performance parameter was tested iteratively ten times for each number of hidden layers such that the average performance is calculated each time; the resulting graph is shown in Figure 9. Here, training will yield us, 10 independent models, for each hidden layer i.e., when tested for up to 20 hidden layers in total 10 × 20 = 200 models were developed. Out of which every 10 models with the same number of hidden layers were passed iteratively for untouched test features, the average of ten iterations is then reported. Later the best performance was observed for 17 hidden layers. Table A3 in the appendix shows the result of each iteration. Subsequently, this network was used to verify the dataset from the GARD dataset with its mean ROI feature as the input.
The same process was applied to train the BR-NN from the GARD dataset features and was tested under the same condition for the ADNI dataset features. Both ROI_CL and ROI_AAL were used separately as features.

Figure 9.
Testing results from the proposed method by training the ADNI dataset; here, the plotted curve is the average results by performing ten iterations for every value of n, i.e., the number of hidden layers.
The same process was applied to train the BR-NN from the GARD dataset features and was tested under the same condition for the ADNI dataset features. Both ROI_CL and ROI_AAL were used separately as features. Testing results from the proposed method by training the ADNI dataset; here, the plotted curve is the average results by performing ten iterations for every value of n, i.e., the number of hidden layers. Table 4 presents the clinical ROIs found on t-test group comparison between participating class domains; it lists the important brain regions identified using AAL matching. There were majorly six clusters to distinguish the voxel intensities, with Left Temporal Inferior Lobe, both Right and left Thalamus, left and right Amygdala, left Fusiform, left Insula, and right Hippocampus being the prominent ones for CN versus AD, at p < 0.001, whereas left and right Hippocampus, Temporal mid lobe, right Parahippocampal region were highlighted as dense cluster for CN versus MCI. Details of all the clusters and numbers of voxel detected are presented in Supplementary files. The ROIs identified using ADNI datasets are clearer and larger whereas GARD datasets are smaller and dimmer, (please see Figures 4 and 5) which may be due to the difference of MRI acquisition protocol [41], still the regions like the hippocampus, Amygdala, Temporal lobe, Thalamus are commonly deteriorated during AD/ADD and MCI/mAD (please see Tables 2 and 3, the T-value measures the size of the difference relative to the variation in the group data. A higher T-value supports our null hypothesis of group difference condition. SD of T represents the standard deviation measure of T-value and mean-T, the mean value of all T-values for voxels in its cluster). As it is believed that the loss of memory is mostly related to the limbic system including its subcortical structures like Hippocampus, Amygdala, more prominent in the left hemisphere [42]. However various cortical structures including frontal, Parietal, Temporal lobes are also affected during neurodegeneration [43,44]. Hence, it is wise to state that overall brain structures associated with memory are affected during the AD memory dysfunction phase.

Experimental Result
The classification performance of our proposed method was tested based on three parameters: accuracy, specificity, and sensitivity. Figure 10 displays the comparative classification performance results using the ADNI dataset trained model on the left part and the same parameter with the GARD trained dataset on the right part. Accuracy for ADNI trained dataset drops from 97% to 80% in the GARD test set in the case of ROI_CL based features, whereas the accuracy drop is not so severe in ROI_AAL based features. Similarly, accuracy from the GARD-trained dataset drops from 96% to 91% in the ADNI test set using ROI_CL features and from 97% to 88% in ROI_AAL features. Here all the results are from ADNI-trained BR-NN models; the first four results are for 30% test sets of ADNI, whereas the last four are for 100% of the GARD dataset used as the test set. To test the validation of the proposed method, some ablation experiments were also performed to conclude the need for smoothing and the importance of all features. Besides, this experiment compares the efficiency via time consumption of each experiment. The results of all performed experiments are shown in Table 5. Appl. Sci. 2021, 11, x FOR PEER REVIEW 15 of 22 Figure 10. Classification test result from BR-NN trained on ADNI or GARD dataset. For ideal classification, the accuracy, sensitivity, and specificity are all 100%. Training on both dataset features would overfit the data and might not produce a convincing result for an independent test, hence the dataset was trained independently.

Discussion
Our result suggests the proposed method shows good classification results for AD vs. CN (ADD vs. NC), and the network can be trained from either database, as the result is good in both cases. However, in the case of CN vs. MCI (NC vs. mAD), the result is satisfactory on the trainee dataset, but poor on the non-trainee dataset. This shows that the distinction between CN vs. MCI is unfavorable with the proposed method; further, on the fine analysis of feature, we found that the characteristics of the CN and MCI feature after Gaussian smoothing become indistinguishable in contrary to CN vs. AD, which may be the reason why the classifier cannot distinguish between them. In general AD and CN are comparatively more discriminative than CN and MCI (this is our primary assumption), in the course of smoothing, the CN features tend more towards the CN mean value and so does AD features, so because of this, the MCI group (which lies between AD and CN) when smoothen loosens its MCI features and tends to follow either AD or CN features, depending on its feature weights. This is also highlighted by the result of the ROI_CL/ROI_AAL +BR-NN result column of Table 5, where the performance of all test conditions i.e., AD vs. CN and CN vs. MCI abruptly drops if Gaussian smoothing is not mediated before classification. Similarly, the test with half of the features results in poor accuracy for the non-trainee dataset i.e., GARD classification.
Cuingnet et al. [2] have shown in their experiments that the voxel-based direct method (similar to ROI_CL) and the atlas-based method (similar to ROI_AAL) for AD vs. CN independent test in ADNI outperformed cortical thickness and hippocampus volumebased classification. Also, methods using the whole brain reached specificity over 90% than those based on the hippocampus. Besides in prodromal AD detection (CN vs. MCIc), the sensitivity was substantially lower as in our case. In comparison to the cortical thickness and hippocampus analysis, VBM-based methods are simple and direct, SPM software can be used to generate tissue probability maps without extra parcellation (with parcellation error during registration) as required by the Freesurfer software (https://surfer.nmr.mgh.harvard.edu/ (accessed on 21 February 2021), which accounts for heavy time consumption and hardware operation. Similarly, contrary to deep neural Figure 10. Classification test result from BR-NN trained on ADNI or GARD dataset. For ideal classification, the accuracy, sensitivity, and specificity are all 100%. Training on both dataset features would overfit the data and might not produce a convincing result for an independent test, hence the dataset was trained independently.

Discussion
Our result suggests the proposed method shows good classification results for AD vs. CN (ADD vs. NC), and the network can be trained from either database, as the result is good in both cases. However, in the case of CN vs. MCI (NC vs. mAD), the result is satisfactory on the trainee dataset, but poor on the non-trainee dataset. This shows that the distinction between CN vs. MCI is unfavorable with the proposed method; further, on the fine analysis of feature, we found that the characteristics of the CN and MCI feature after Gaussian smoothing become indistinguishable in contrary to CN vs. AD, which may be the reason why the classifier cannot distinguish between them. In general AD and CN are comparatively more discriminative than CN and MCI (this is our primary assumption), in the course of smoothing, the CN features tend more towards the CN mean value and so does AD features, so because of this, the MCI group (which lies between AD and CN) when smoothen loosens its MCI features and tends to follow either AD or CN features, depending on its feature weights. This is also highlighted by the result of the ROI_CL/ROI_AAL +BR-NN result column of Table 5, where the performance of all test conditions i.e., AD vs. CN and CN vs. MCI abruptly drops if Gaussian smoothing is not mediated before classification. Similarly, the test with half of the features results in poor accuracy for the non-trainee dataset i.e., GARD classification.
Cuingnet et al. [2] have shown in their experiments that the voxel-based direct method (similar to ROI_CL) and the atlas-based method (similar to ROI_AAL) for AD vs. CN independent test in ADNI outperformed cortical thickness and hippocampus volume-based classification. Also, methods using the whole brain reached specificity over 90% than those based on the hippocampus. Besides in prodromal AD detection (CN vs. MCIc), the sensitivity was substantially lower as in our case. In comparison to the cortical thickness and hippocampus analysis, VBM-based methods are simple and direct, SPM software can be used to generate tissue probability maps without extra parcellation (with parcellation error during registration) as required by the Freesurfer software (https://surfer.nmr.mgh.harvard.edu/ (accessed on 21 February 2021), which accounts for heavy time consumption and hardware operation. Similarly, contrary to deep neural networks like CNN, which are beneficial for standalone classification purposes and claiming the state-of-the-art performance in image classification. However, considering very few CNN architectures being designed for 3D MRI scans [45] and the significances of ROIs analysis as discussed in Sections 3.2 and 6.1 we were interested in VBM. In VBM, we follow an established preprocessing pipeline to extract and visualize the group cluster difference for generating ROIs of significant regions, hence is more specific in voxel-based group statistics analysis, whereas CNN is a gradient-loss supervised algorithm for imageattributes based feature extraction process primarily focused on classification rather than ROI detection. Table 6 shows a brief comparison with other state-of-the-art performances. Zhang et al. [46] used a multimodal imaging technique combining MRI, CSF, and PET data for classification and reported up to 86.2% accuracy. Westman et al. [47] combined MRI and CSF measurements to classify AD with NC using Freesurfer-extracted cortical thickness and subcortical volumes. Aguilar et al. [48] used the same features for classification based on the clinical dementia rating (CDR) score. Beheshti et al. [7] used the VBM extracted VOI features of the voxel and PDF for feature selection, which was classified using the SVM classifier. Subsequently, Beheshti et al. [8] reported a higher accuracy of 92.48% using feature-ranking techniques to select the discriminative top feature instead of a feature selection technique. The most used dataset is ADNI, whereas few researchers have used OASIS [49] to report the accuracy. Most of the stated accuracy is from crossvalidation (CV). Conversely, we have reported average accuracy results from the test set. We preferred a completely untouched test set so that the neural network model is trained only using training features ( Figure 2). Besides, it will also report a generalized cum unbiased performance parameter. However, for verification we also tested 5-fold CV on the whole feature set of ADNI and GARD datasets, to be accuracy equivalent to 98% and 97% respectively, however, the most important accuracy is in other untouched datasets, which cannot be calculated using k-fold. Besides, in contrast to single-step feature extraction from voxels of t-map, we have implemented multistep AAL based masking (ROI_AAL) and cluster-based selection of ROI (ROI_CL), then used the selected voxels as features (results in Supplementary Data). This mask creation steps from AAL and cluster helps to identify the atrophic anatomic regions of the brain, signifying the probable area of interest for AD/ADD detection. All these methods are still progressively being updated and modified for better classification performance. Our reported accuracy of 98.22% for CN vs. AD is the highest hitherto using VBM features on the ADNI database. Further, the accuracy of 90.48% in a completely different dataset, i.e., GARD indicated that it has generalized well.

Conclusions
We present herein an efficient approach to extract mean weighted features from the t-map obtained after VBM processing and then apply GS on the raw data. This will help to distribute the features into a Gaussian distribution such that the classifier does not require many operations to identify the distinguishing feature. Using an excessive number of training materials could over-train the network and make it prone to over-fitting, so few training materials were used for optimal training of the classifier model, which is also one supportive idea for generalization. The use of the BR-NN helps to generalize the network. Hence, we presented an optimized approach for the classification of an AD/ADD from the CN/NC based on VBM ROI extracted-smoothed features. Our uniqueness lies in the selection of ROIs and the extraction of ROI features. Unlike using all the voxel of ROI generated from the t-test of VBM, we have used only the mean weighted value from the entire ROIs voxels thus reducing feature and applying feature selection at once. Besides, using AAL maps generated standard co-registered ROIs, priorities based on t-map generated clusters of each anatomic region for mean weighted feature extraction are firstly implemented by us as per our knowledge. To improve the result, we have suggested Gaussian smoothing on a small window, which may also be the new attempt on postprocessing. Final classification based on Bayesian regularized NN finds this distribution computationally easier to classify improving the overall performance accuracy of the system. Additionally, we have experimented with two datasets in reference to each other to validate the process. To summarize, our proposed method to optimize and generalize the AD/ADD detection using T1-MRI yields significant improvement over the previously used VBM based method, at the same time, our findings of brain ROIs can help clinicians to focus on those specific regions for further future discovery. Hence, we hope our attempt will have some meaningful contribution in the future.
(accessed on 12 May 2021)). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Correspondence should be addressed to Goo-Rak Kwon: grkwon@chosun.ac.kr.

Conflicts of Interest:
The authors declare that there are no conflict of interest regarding the publication of this paper.
Ethics Statement: As per ADNI protocols, all procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. More details can be found at adni.loni.usc.edu. (This article does not contain any studies with human participants performed by any of the authors). A code of IRB authentication is 2-1041055-AB-N-01-2019-45/2020-72 in IRB committee of Chosun university.