Classiﬁcation of Initial Stages of Alzheimer’s Disease through Pet Neuroimaging Modality and Deep Learning: Quantifying the Impact of Image Filtering Approaches

: Alzheimer’s disease (AD) is a leading health concern affecting the elderly population worldwide. It is deﬁned by amyloid plaques, neuroﬁbrillary tangles, and neuronal loss. Neuroimaging modalities such as positron emission tomography (PET) and magnetic resonance imaging are routinely used in clinical settings to monitor the alterations in the brain during the course of progression of AD. Deep learning techniques such as convolutional neural networks (CNNs) have found numerous applications in healthcare and other technologies. Together with neuroimaging modalities, they can be deployed in clinical settings to learn effective representations of data for different tasks such as classiﬁcation, segmentation, detection, etc. Image ﬁltering methods are instrumental in making images viable for image processing operations and have found numerous applications in image-processing-related tasks. In this work, we deployed 3D-CNNs to learn effective representations of PET modality data to quantify the impact of different image ﬁltering approaches. We used box ﬁltering, median ﬁltering, Gaussian ﬁltering, and modiﬁed Gaussian ﬁltering approaches to preprocess the images and use them for classiﬁcation using 3D-CNN architecture. Our ﬁndings suggest that these approaches are nearly equivalent and have no distinct advantage over one another. For the multiclass classiﬁcation task between normal control (NC), mild cognitive impairment (MCI), and AD classes, the 3D-CNN architecture trained using Gaussian-ﬁltered data performed the best. For binary classiﬁcation between NC and MCI classes, the 3D-CNN architecture trained using median-ﬁltered data performed the best, while, for binary classiﬁcation between AD and MCI classes, the 3D-CNN architecture trained using modiﬁed Gaussian-ﬁltered data performed the best. Finally, for binary classiﬁcation between AD and NC classes, the 3D-CNN architecture trained using box-ﬁltered data performed the best.


Introduction
Alzheimer's disease (AD) is a brain disorder that has no effective treatment [1,2]. AD affects brain regions such as the hippocampus, gyrus, etc., during the course of its progression [3,4]. Deposition of amyloid-β plaques and tau concentrates is likely the first stage in the development of AD. In addition, neurodegeneration, associated with brain Finally, another avenue that has been explored is utilizing graph theory and machine learning for MCI-C/MCI-NC binary classification tasks [29] to propose a multivariate prognostic model for MCI-C/MCI-NC binary classification tasks [30]; propose a high-order, multimodal, multimask feature-learning model for deciphering the temporal relationship between longitudinal measures and progressive cognitive scores [31] for the integration of SPARE-AD and cerebrospinal fluid (CSF) values for MCI-C/MCI-NC binary classification tasks [32]; build a combination of independent component analysis (ICA) and Cox models to predict MCI progression [33]; propose a framework for the integration of MRI and PET modalities for binary classification tasks such as AD/NC, MCI-C/MCI-NC, NC/MCI-C, NC/MCI, etc. [34]; propose an approach for building a robust classifier for AD/NC and NC/MCI binary classification tasks [10]; employ sparse logistic regression for MCI-C/MCI-NC binary classification tasks [8]; and utilize multimodality image data for diagnosis and prognosis of AD at the MCI or preclinical stages [27], as well as a demographic-adjusted multivariable Cox model for MCI to AD conversion [35].
Image filtering methods are commonly deployed in different applications to modify an image. These methods alter the appearance and properties of an image to emphasize or remove certain features. They have found numerous applications in smoothing, sharpening, and edge enhancement. Box filtering, median filtering, Gaussian filtering, and their modifications have found a number of applications in the domain of image processing. Median filtering uses neighborhood operations to remove noise, as well as fine image details using maximum-likelihood-based operations. Gaussian filtering introduces blurring to an image in an asymmetric fashion, ignoring image brightness and helping in smoothing the images significantly by performing nonlinear low-pass filtering. Box filtering, a type of low-pass filter, works by averaging the values in the neighborhood region, and the filter kernel defines the type of filtering in a general fashion for different operations such as smoothing.
In this work, we explored the impact of different image filtering methods such as box filtering, median filtering, Gaussian filtering, and modified Gaussian filtering on the performance of deep convolutional neural networks (CNNs) for the early diagnosis of AD. We deployed 3D-CNN architectures to extract features from the PET neuroimaging modality and classified them into NC, MCI, and AD classes simultaneously and bilaterally. We considered four problems: three-class classification among MCI, NC, and AD classes and binary classifications between MCI and NC, MCI and AD, and NC and AD classes. We did not employ data augmentation for the binary and multiclass classification tasks studied using the PET modality.

Datasets Description
We used scans of the PET neuroimaging modality from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, as shown in Table 1. The data are presented in mean (min-max) format. We split the datasets at subject level for the experiments.

Methodology
In this study, we considered four problems: multiclass (three classes) classification between MCI, NC, and AD classes and three-binary classification problems; that is, binary classification between MCI and NC classes, MCI and AD classes, and NC and AD classes. We will now describe the deep learning architectures for solving these problems using the PET dataset.
The detailed 3D-CNN architecture for three binary and one multiclass (3 classes) classification tasks between NC, MCI, and AD classes using the PET neuroimaging modality and box filtering, Gaussian filtering, modified Gaussian filtering, and median filtering approaches is presented in Figure 1, while Figure 2 presents some of the scans used in the experiments. In Figure 2, SSIM is the structural similarity index. Figure 2b-e shows the SSIM index of PET scans from the reference image. Here, modified Gaussian filtering means that Gaussian filtering is applied to the input volume, followed by a second round of Gaussian filtering. Mathematically, In Equation (1), N represents the total number of inputs. For the AD-MCI binary classification task, the number of feature maps in the convolutional feature extracting layers is 8, while the number of neurons is 100, 30, and 2 for Fully Connected (FC) Layer 1, FC Layer 2, and FC Layer 3, respectively. For the AD-NC binary classification task, the number of feature maps in the convolutional feature extracting layers is 6, while the number of neurons is 100, 30, and 2 for FC Layer 1, FC Layer 2, and FC Layer 3, respectively.
For the MCI-NC binary classification task, the number of feature maps in the convolutional feature extracting layers is 9, while the number of neurons is 100, 30, and 2 for FC Layer 1, FC Layer 2, and FC Layer 3, respectively.
For the AD-MCI-NC multiclass classification task, the number of feature maps in the convolutional feature extracting layers is 11, while the number of neurons is 500, 300, and 3 for FC Layer 1, FC Layer 2, and FC Layer 3, respectively.

Methodology
In this study, we considered four problems: multiclass (three classes) classification between MCI, NC, and AD classes and three-binary classification problems; that is, binary classification between MCI and NC classes, MCI and AD classes, and NC and AD classes. We will now describe the deep learning architectures for solving these problems using the PET dataset.
The detailed 3D-CNN architecture for three binary and one multiclass (3 classes) classification tasks between NC, MCI, and AD classes using the PET neuroimaging modality and box filtering, Gaussian filtering, modified Gaussian filtering, and median filtering approaches is presented in Figure 1, while Figure 2 presents some of the scans used in the experiments. In Figure 2, SSIM is the structural similarity index. Figure 2b-e shows the SSIM index of PET scans from the reference image. Here, modified Gaussian filtering means that Gaussian filtering is applied to the input volume, followed by a second round of Gaussian filtering. Mathematically, In Equation (1), N represents the total number of inputs.   For the AD-MCI binary classification task, the number of feature maps in the convolutional feature extracting layers is 8, while the number of neurons is 100, 30, and 2 for Fully Connected (FC) Layer 1, FC Layer 2, and FC Layer 3, respectively. For the AD-NC binary classification task, the number of feature maps in the convolutional feature extracting layers is 6, while the number of neurons is 100, 30, and 2 for FC Layer 1, FC Layer 2, and FC Layer 3, respectively.
For the MCI-NC binary classification task, the number of feature maps in the convolutional feature extracting layers is 9, while the number of neurons is 100, 30, and 2 for FC Layer 1, FC Layer 2, and FC Layer 3, respectively.
For the AD-MCI-NC multiclass classification task, the number of feature maps in the convolutional feature extracting layers is 11, while the number of neurons is 500, 300, and 3 for FC Layer 1, FC Layer 2, and FC Layer 3, respectively.
In the architecture given in Figure 1, there is an input layer accepting a volume of size 79 × 95 × 69 with zero-center normalization applied to it that works by subtracting the mean and dividing it with the standard deviation to center the data volume on the origin. After that, a block named Block-A is repeated 5 times, and this block is made up of a 3D convolutional layer with a kernel of size 3 in all dimensions. A variable number of feature maps for the extraction of features with a weight and bias L2 factor of 0.00005 is applied to mitigate the impact of overfitting. After the convolutional layer, there is a batch normalization layer, the purpose of which is to normalize the incoming batches through division by standard deviation and subtracting the means, which helps in making the training process faster. After this layer, there is an exponential linear unit (ELU) activation layer with an α value of 1, the purpose of which is to introduce nonlinearity to the output of a neuron. Mathematically, After this layer, there is a max-pooling layer with a filter and stride size of 2 in all dimensions to reduce the dimensions of feature maps for computational efficiency. Block-A is followed by Block-B, which is repeated a single time and is made up of three fully In the architecture given in Figure 1, there is an input layer accepting a volume of size 79 × 95 × 69 with zero-center normalization applied to it that works by subtracting the mean and dividing it with the standard deviation to center the data volume on the origin. After that, a block named Block-A is repeated 5 times, and this block is made up of a 3D convolutional layer with a kernel of size 3 in all dimensions. A variable number of feature maps for the extraction of features with a weight and bias L2 factor of 0.00005 is applied to mitigate the impact of overfitting. After the convolutional layer, there is a batch normalization layer, the purpose of which is to normalize the incoming batches through division by standard deviation and subtracting the means, which helps in making the training process faster. After this layer, there is an exponential linear unit (ELU) activation layer with an α value of 1, the purpose of which is to introduce nonlinearity to the output of a neuron. Mathematically, After this layer, there is a max-pooling layer with a filter and stride size of 2 in all dimensions to reduce the dimensions of feature maps for computational efficiency. Block-A is followed by Block-B, which is repeated a single time and is made up of three fully connected layers with a variable number of neurons for different classification tasks: FC Layer 1, FC Layer 2, and FC Layer 3; a dropout layer with a probability of 0.1; a softmax layer; and, finally, a classification layer. The FC layers offer full connections to the incoming data volumes for feature extraction, and these layers have bias and weight L2 factors of 0.00005 to mitigate the impact of overfitting. The dropout layer works by randomly dropping neurons, and in the process, performs regularization of the inputs, which helps to mitigate overfitting. In this case, a factor of 0.1 or 10% drops 10% neurons from the input. Finally, softmax and classification layers convert a vector of numbers into a vector of probabilities with uniform distribution and classify them into 2 or 3 classes, depending on the task.

Experiments and Results
To select an optimum set of hyperparameters for the experiments, we used a five-fold cross-validation approach. In order to test the effectiveness of the model in a real-world scenario, we built an independent test set and never used it for training. This set has 12 NC class instances, 7 MCI class subjects, and 4 instances of the AD class. Different performance metrics such as relative classifier information (RCI), confusion entropy (CEN), index of balanced accuracy (IBA), geometric mean (GM), Matthews correlation coefficient (MCC), sensitivity (SEN), specificity (SPEC), F-measure, precision, and balanced accuracy were chosen to assess the performance of different tasks. Only balanced classes, where training and validation sets have an equal number of samples, were considered for the experiments.
For the experiments, a learning rate of 0.001 was chosen using a piecewise decay policy, and this learning rate was reduced after every 7 epochs. The total number of epochs was set to 35, and the size of the batch was set to 2. Adam [36] was chosen as the optimizer, while categorical cross-entropy was chosen as a loss function. Examples were randomized after each epoch. The results of the experiments are presented in Tables 2-5.  In Table 2, it can be seen that the 3D-CNN architecture trained on Gaussian-filtered data performed the best for the multiclass classification task. It is followed by box filtering and modified Gaussian-filtering-based approaches. The worst-performing architecture utilizes median-filtered data. As a matter of fact, considering RCI, average CEN, average IBA, average GM, and average MCC as performance metrics, the worst-performing model is the 3D-CNN architecture trained using median-filtered data. In terms of RCI and average CEN metrics, the best-performing model is the 3D-CNN architecture trained using box-filtered data, while, in terms of average IBA, average GM, and average MCC metrics, the best-performing model is the 3D-CNN architecture trained on Gaussian-filtered data. Figures 3 and 4 display the results and rankings of methods for the multiclass classification task. Table 3. Result of binary classification between AD and MCI classes.

Architecture
Performance      In Table 3, it can be seen that the 3D-CNN architecture trained using modified Gaussian-filtered data performed the best for the AD-MCI binary classification task. It is followed by the 3D-CNN architecture trained using Gaussian-filtered and 3D-CNN architecture trained using median-filtered data. The worst-performing model is the 3D-CNN architecture trained using box-filtered data. It can also be observed that, in terms of the SEN metric, the best-performing model is the 3D-CNN architecture trained on median-filtered data. In terms of SPEC, F-measure, precision, and balanced accuracy, the best-performing model is the 3D-CNN architecture trained on modified Gaussianfiltered data. Figures 5 and 6 display the results and rankings of methods for the AD-MCI binary classification task.   In Table 4, it can be seen that the 3D-CNN architecture trained using box-filtered data performed the best for the AD-NC binary classification task. It is followed by the 3D-CNN architectures trained on median-filtered and modified Gaussian-filtered data. The worst-performing model is the 3D-CNN architecture trained on Gaussian-filtered data. In terms of SEN, F-measure, and balanced accuracy, the worst-performing model is the 3D-CNN architecture trained on Gaussian-filtered data. In terms of SPEC and precision, the worst-performing model is the 3D-CNN architecture trained using modified Gaussianfiltered data. Figures 7 and 8 display the results and rankings of methods for the AD-NC binary classification task.   In Table 5, it can be seen that the best-performing model is the 3D-CNN architecture trained on median-filtered data for the NC-MCI binary classification task. It is followed by the 3D-CNN architectures trained on modified Gaussian-filtered and Gaussian-filtered data. The 3D-CNN architecture trained using box-filtered data performed the worst. In terms of balanced accuracy, F-measure, and SEN, the best-performing model is the 3D-CNN architecture trained on median-filtered data, while in terms of SPEC and precision, the best-performing model is the 3D-CNN architecture trained on modified Gaussianfiltered data. The worst-performing model, in terms of SEN and F-measure, is the 3D-CNN architecture trained on modified Gaussian-filtered data. Figures 9 and 10 display the results and rankings of methods for the MCI-NC binary classification task.

Discussion
Whole-brain-based methods such as those used in this study are most advantageous in advanced stages of the disease when AD-related brain changes have spread to affect cognitive functions such as daily living activities. This is also confirmed in our study, since performance on the MCI-NC binary classification task is not as good as on the AD-NC or MCI-AD binary classification task. Whole-brain-based methods may be helpful in capturing changes in brain regions that are otherwise nondetectable with other approaches such as region-of-interest-based approaches. Regions such as the medial temporal lobe, cingulate gyrus, frontal gyrus, fusiform gyrus, thalamus, and occipital cortices play a pivotal role in changes associated with AD and can be captured using whole-brain approaches effectively, which can be instrumental in preclinical studies [37][38][39][40][41][42][43].
Despite using a deeper architecture for the MCI-NC binary classification task in comparison with the other two binary classification tasks AD-NC and AD-MCI, the performance of the compared filtering approaches is not as good due to the reasons mentioned

Discussion
Whole-brain-based methods such as those used in this study are most advantageous in advanced stages of the disease when AD-related brain changes have spread to affect cognitive functions such as daily living activities. This is also confirmed in our study, since performance on the MCI-NC binary classification task is not as good as on the AD-NC or MCI-AD binary classification task. Whole-brain-based methods may be helpful in capturing changes in brain regions that are otherwise nondetectable with other approaches such as region-of-interest-based approaches. Regions such as the medial temporal lobe, cingulate gyrus, frontal gyrus, fusiform gyrus, thalamus, and occipital cortices play a pivotal role in changes associated with AD and can be captured using whole-brain approaches effectively, which can be instrumental in preclinical studies [37][38][39][40][41][42][43].
Despite using a deeper architecture for the MCI-NC binary classification task in comparison with the other two binary classification tasks AD-NC and AD-MCI, the performance of the compared filtering approaches is not as good due to the reasons mentioned

Discussion
Whole-brain-based methods such as those used in this study are most advantageous in advanced stages of the disease when AD-related brain changes have spread to affect cognitive functions such as daily living activities. This is also confirmed in our study, since performance on the MCI-NC binary classification task is not as good as on the AD-NC or MCI-AD binary classification task. Whole-brain-based methods may be helpful in capturing changes in brain regions that are otherwise nondetectable with other approaches such as region-of-interest-based approaches. Regions such as the medial temporal lobe, cingulate gyrus, frontal gyrus, fusiform gyrus, thalamus, and occipital cortices play a pivotal role in changes associated with AD and can be captured using whole-brain approaches effectively, which can be instrumental in preclinical studies [37][38][39][40][41][42][43].
Despite using a deeper architecture for the MCI-NC binary classification task in comparison with the other two binary classification tasks AD-NC and AD-MCI, the performance of the compared filtering approaches is not as good due to the reasons mentioned above. For the multiclass classification task, we used a much deeper architecture in comparison with the architectures used in the other classification tasks; however, the performance on this task is the worst of all four tasks. This could be explained by the fact that adding more classes mostly results in deteriorating performances for classification-based studies if the samples are not increased appropriately. Performance on the AD-NC binary classification task is the best, despite using the shallowest architecture of all, which is, again, due to the fact that AD-related brain changes are easily identified at this level of discrimination. The performance on the AD-MCI binary classification task is in between the AD-NC and NC-MCI binary classification tasks, which could be explained by slightly more advanced changes at this stage in the brain. Perhaps the consideration of region-ofinterest-based approaches could be beneficial for the MCI-NC binary and AD-MCI-NC multiclass classification tasks.
There are many limitations of this work, such as lack of utilization of multimodal information. This information includes neuropsychological and clinical data such as age, FAQ and NPI-Q scores, etc. Consideration of this information will likely increase the performance of classifiers further. Another limitation is generalization issues due to age in order of AD, MCI, and NC stages, since changes in the brain at the AD stage are more pronounced, followed by changes in the MCI and NC stages, respectively.
We did not consider longitudinal data in this study, which will likely increase the performance of classifiers further in comparison with cross-sectional data by learning better representation and encoding for individual subjects. MCI subjects could potentially benefit from such data, as MCI lies in continuum between NC and AD for a span of approximately 36 months and has many possible time points within this span. Finally, a comparison of our best methods with the other methods reported in the literature is given in Table 6.

Conclusions
In this work, we presented a deep learning approach to quantify the impact of different image filtering techniques on the early diagnosis of AD. Box filtering, median filtering, Gaussian filtering, and modified Gaussian filtering approaches are studied, and their impacts on the early diagnosis of AD are explored. The obtained results clearly show that no scheme has superiority over another. As a matter of fact, all four schemes performed optimally on different binary and multiclass classification problems. The Gaussian-filtered image is more structurally similar to the reference image. It is followed by the modified Gaussian-filtered, then median-filtered, and, finally, box-filtered image. The Modified Gaussian filtering approach is unique and has been used for the first time in the literature. Its performance is found to be better among other filtering approaches, and an interesting observation is that it never performed the worst in any of the classification tasks considered in this study. This study can be extended further by considering other deep learning approaches such as graph convolutional networks, as well as filtering methods such as those based on deep learning [59] (see also [60]).

Conflicts of Interest:
The authors declare no conflict of interest.