Convolutional Neural Network-Based Parkinson Disease Classiﬁcation Using SPECT Imaging Data

: In this paper, we used the single-photon emission computerized tomography (SPECT) imaging technique to visualize the deﬁciency of dopamine-generated patterns inside the brain. These patterns are used to establish a patient’s disease progression, which helps distinguish the patients into different categories. Furthermore, we used a convolutional neural network (CNN) model to classify the patients based on the dopamine level inside the brain. The dataset used throughout this paper is the Parkinson’s progressive markers initiative (PPMI) dataset. The collected dataset was pre-processed and data ampliﬁcation was performed to balance the imbalanced dataset. A CNN-based neural network was deﬁned to classify input SPECT images into four categories. The motivation behind the proposed model is to reduce the number of resources consumed while maintaining the performance of the classiﬁcation model. This will help the healthcare ecosystem run the classiﬁcation model on mobile devices. The proposed model contains 14 layers with input layers, convolutional layers, max-pool layers, ﬂatten layers, and dense layers with different dimensions. The dense layer classiﬁes the patients into four different categories, including PSD, healthy control, scans without evidence of dopaminergic deﬁcit (SWEDD), and GenReg PSD from the entire SPECT imaging dataset, which is used to establish the disease progression of different patients using SPECT images. The proposed model is trained with a large dataset with 58,692 images for training and 11,738 images for validation, and 7826 for testing. The proposed model outperforms the classiﬁcation models from the surveyed papers. The proposed model’s accuracy is 0.889, recall is 0.9012, the precision is 0.9104, and the F1-score is 0.9057.


Introduction
Parkinson's disease (PSD) is a neurological degeneration of the internal nervous system in which the nerve cells are gradually destroyed with time. PSD patients can suffer from various motor and non-motor-based symptoms such as instability in walking and balancing and movement-related disabilities such as tremor, rigidity, postural impairments, and bradykinesia [1]. In the human brain, dopamine is a significant neurotransmitter essential for many body processes, including coordination and movement. It transfers signals from one nerve cell to another in healthy control subjects to control the movements. PSD patients suffer from dopamine deficiency, whereby their brains cannot transfer signals to other body parts. Thus, PSD patients suffer from abnormal activity patterns resulting in movement-based severity and a high risk of traumatic events. [2].
Nowadays, PSD can be detected based on the patient's symptoms. To confirm the PSD, a doctor can examine the patient, analyze the unified Parkinson's disease rating scale (UPDRS), and prepare the medical diagnosis. Doctors and medical practitioners have different viewpoints about the medical diagnosis of PSD patients due to its matching characteristics with other diseases, such as Alzheimer's, May-Hegglin anomaly (MHA), and essential tremor [3]. Therefore, if an assessment is performed based on symptoms, there is a possibility of misdiagnosis [4]. Medical imaging is one possible option to extract and analyze the features from the neural brain [5,6]. It measures dopamine level, glucose metabolism, deficiency of dopaminergic neurons (DNs), and other abnormalities in the brain. In addition, medical imaging is used to observe the visual presentation of radiomic features, which helps establish the disease progression early. Presently, various imaging techniques are used to detect disease progression, where SPECT and positron emission tomography (PET) detect the deficiency of metabolism, DNs, and neurochemical changes inside the human brain [7].
In this imaging modality, a radioactive tracer is inserted into a patient's blood vessel and traced using SPECT imaging. It mainly analyzes the nerve cells which transfer the dopamine in the brain. In this paper, SPECT is used to detect the deficiency of dopamine transporter(DT) in the human brain [8]. DT arbitrates the circulation of the neurochemical transmitter dopamine among nerve cells. It interrupts the communication process among nerve cells and the transporter of the brain. SPECT is used to evaluate the DT level to establish the patient's disease progression. The health of the nerve cells can be checked using the availability of DT chemicals inside the brain. This chemical transfers a signal to control coordination and body movements. The deficiency of dopamine chemical causes issues for PSD patients, such as movement-based activities. Thus, the SPECT images are used to visualize the area inside the brain of PSD patients. This visualization considers the presence of dopamine levels and generates a pattern inside the brain. Thus, this pattern and strength of the brain determine the health of the nerve cells, which helps to identify the PSD, healthy control, scans without evidence of dopaminergic deficit (SWEDD), and GenReg PSD patients.
We used a deep-learning-based CNN model to classify the patients based on the presence of dopamine levels inside the brain. Initially, SPECT images were obtained from the Parkinson's progression markers initiative (PPMI) dataset, we pre-processed the images with data normalization techniques, and scaled them in the range of 0 to 1. Moreover, the pre-processed data were fed into the CNN model's input layer, consisting of 14 layers with several dimensions and kernel sizes. The dense layer classifies the patients into four categories-PSD, healthy control, SWEDD, and GenReg PSD. Next, we estimated the model's performance with different performance evaluation matrices such as precision, recall, accuracy, and F1-score.

Motivation
The authors have used diagnostic tests to identify the PSD [9]. It is not straightforward to locate PSD from the diagnostic test; there is a chance of misdiagnosis. For accurate prediction, the authors in [8,10] used various machine learning (ML) algorithms such as linear regression, support vector machine (SVM), random forest, and decision tree classifier for the classification of PSD patients, which overcomes the issue of a diagnostic test. ML algorithms extract the features manually to predict the diagnosis. The deep learning (DL) technique helps differentiate PSD, healthy control, SWEDD, and GenReg PSD patients to improve the model's performance. Motivated by this, in this paper, we present a CNN-based DL algorithm using the SPECT medical imaging technique to monitor the deficiency of DT and measure the patient's disease progression.

Research Contributions
The research contributions of this scheme are the following: • We proposed a CNN-based model to classify patients with Parkinson's disease accurately. The CNN-based model helps us to infer the results within a few seconds and the training of this model is performed using the SPECT imaging dataset; • The proposed model monitors the deficiency of DT, and with the help of SPECT images, it classifies the input under the four categories of PSD, Control, SWEDD, and GenReg PSD. We have made this model smaller in size, helping the organization overcome the scarcity of computational power in remote areas. As the model is smaller in size, it can be deployed within a smartphone as well. To maintain the performance intake while decreasing the model size in terms of parameters, we compared the accuracy, precision, recall, and F1-score with the state-of-the-art models. Figure 1 presents the organization which presents the entire structure of the paper. The rest of the paper is structured as follows. Related work is presented in Section 2. Section 3 describes the system model and problem formulation. Section 4 presents the proposed scheme. Performance evaluation is outlined in Section 5. Section 6 concludes the paper.

Related Work
This section presents the researchers' state-of-the-art works and gives a tabular comparison with the proposed schemes. Nowadays, neuroimaging modalities provide spatial image resolution techniques to accurately predict neurodegenerative diseases such as PSD and Alzheimer's disease. There are various imaging modalities such as magnetic resonance imaging (MRI), functional magnetic resonance imaging (fMRI), positron emission tomography (PET), and SPECT. Their different roles are to extract ROI features such as radiomic features, DT, glucose metabolism, and accurate diagnosis of a disease. Manually, doctors can assess the patient using a diagnostic and clinical test, which takes more time to identify the patients with PSD. It also does not provide a promising diagnosis of a patient. To resolve this issue, many authors have used ML techniques for patient classification and to help measure a patient's disease progression based on ROI [11]. The authors in [12] used ML techniques for early detection of PSD using SPECT DaTScan imagery. In this scheme, the CNN-based model is trained to predict the early diagnosis of a patient. The model classifies the PSD and non-PSD patients correctly. They used a VGG16-based transfer learning scheme to build the model. The model provides an accurate diagnosis of a patient at an early stage of the disease.
Then, Mohammed et al. [13] proposed a DL model for the correct diagnosis of PSD using SPECT images. They used CNN-based architecture with a 10-fold cross-validation process. The model correctly classifies the healthy control and PSD patient, but they do not specify any biological parameters to extract the features from SPECT images. This problem was resolved by the authors in [14]. They designed an automatic classification algorithm to predict the DT from the SPECT images and identify the risk level of a patient. They proposed a DL-based model to be trained as robust concerning image characteristics without loss of diagnostic accuracy. They used a transfer learning scheme to build their model to predict the patient's correct diagnosis. Then, Ortiz et al. [15] designed the PSD detection scheme to extract the isosurfaces-based features using CNN. They used LeNet and AlexNet-based transfer learning models to reduce the complexity of the input. The model extracts the spread area from SPECT images to produce the disease progression level of the patient.
Later, Adams et al. [16] presented the prediction of motor and non-motor symptoms using DL technique. In this scheme, the authors trained the CNN-based model to predict the UPDRS-III scores year-wise. In this, longitudinal SPECT data are used to lower the average difference of the prediction. The model trains and tests with a ten-fold crossvalidation method, but the authors did not provide any result-oriented parameters such as accuracy, F1-score, precision, and recall. To solve the issues mentioned above, in this paper, the proposed model extracts the deficiency of DT from the SPECT images. This deficiency generates the disease patterns used to identify the progression level of disease at every stage of PSD. This level helps to identify the critical level of the patient to take preventive measures to overcome the severity of the disease. We also design a CNN-based scheme to differentiate the healthy control, PSD, SWEDD, and GenReg PSD-based DT level patterns. The proposed CNN model is trained with 14 layers, where each layer uses various types of kernel size and dimension. Each layer output feeds as an input of the next layer. The model outperforms with evaluation parameters such as precision, recall, and accuracy. Table 1 presents the comparative analysis of the proposed model with the state-of-the-art works.

System Model and Problem Formulation
This section presents the system model and problem formulation of the scheme with several mathematical equations. Figure 2 shows the proposed system model. The SPECT imaging data were obtained from the PPMI dataset [17], which contains the information on size, imaging modality, and settings in three-dimensional form. The data comprise gray scale images with 128 × 128 resolution. First, we pre-processed the data using data normalization techniques such as min-max normalization. In the normalization process, data are scaled in the range of 0 to 1. The dataset is composed of four forms of label-PSD, healthy control, SWEDD, and GenReg PSD. A few labels have only a few quantities of data present in the dataset, which creates the problem of an unbalanced dataset. To preserve the balance of the dataset, we applied a data augmentation process to augment the images from the dataset. After the pre-processing, the data are split into three sets of training, validation, and testing samples. Training is used to train a model using training dataset, validation is used to check the validity of trained model (how correctly the model is trained), and testing is used to test the model using the testing dataset. We train the system model using training samples with 14 layers, where each layer is composed of varying output dimensions and kernel size. Initially, in the input layer, each image I is mapped with a different filter size F i which is presented as O 1 = I * F i and generates a dimension output, Dimension (n − F i + 1), where the dimension of images is denoted with (n, n) and the dimension of filter is denoted with (F i ). After that, the ReLU activation function is applied to generate the output O 2 . Further, we applied forward propagation with initial random weights W i and biases b i to each layer, which is presented This weights and biases computation process is running for all layers of the proposed model. Initially, the first layer is called the input layer, with a dimension of 128 × 128. The second layer follows the output dimension of the input layer, where we consider a convolution 2D layer with input dimension 128 × 128 and kernel size 3 × 3. The output dimension of layer 2 is 126 × 126, which is input for the max pool 2D layer with kernel size 2 × 2. The output resolution of the max-pool 2D layer is 63 × 63. This procedure is repeated for the subsequent layers elaborated in Table 2. After the series of convolution blocks, a flatten layer is introduced, which converts the 3D tensor array into a 1D tensor vector with a size of 256. After the flatten layer, two dense layers with dimensions 512 and 4 with ReLU and softmax as an activation function. This dense layer helps to select the dataset's feature set to enhance the model's classification result. The last dense layer of dimension 4 is the model's output layer, which used the sigmoid activation function to classify the PSD, healthy control, SWEDD, and GenReg PSD subjects from the collected dataset. Moreover, we use testing data to test the model. we have applied backpropagation to improve the performance of the model. This process of backpropagation is presented in terms of mathematical formulation which is as presented below: To improve the performance of the model, we backpropagated the model, whereby the model tries to update the parameters in such a way that the predictions and performance of the model are improved. To update the parameter value, we use the following equation: Equation (1) presents the new parameter value n p , where o p is the old parameter value, L r is the learning rate and G p is the gradient of the parameter value. At every time, the learning rate updates and we can observe the parameter updates and the result of the model. First, we consider the fully connected layer and obtain a derivation of weights and bias and update the weights using the equation below: Equation (2) presents the error E r with respect to weight W i . Now, we compute an error with respect to the final output, which is defined as follows: where E r is an error, a v is the actual output, and the predicted output is O 4 . Now, differentiate the E r with respect to O 4 which can be defined as : Equations (3) and (4) present the error and differentiate the error with respect to predicted output. Now, we differentiate O 3 with respect to W i , we will obtain the value O 2 itself which can be presented as below: We calculate the change in output with respect to weight during back-propagation using Equation (5). Now that we have the individual derivations, we can use the chain rule to find the change in E r with respect to W i , which can be described as: Equation (6) presents the change in error with respect to weight. We update the values in the weight matrix which can be defined as: Equation (7) presents the new weight as a W n , where the old weight is denoted with W o , and L r denotes with the learning rate of the model.
After model training, the validation is used to check the correctness of the trained model and the performance of the trained model is verified using the performance matrices. After the validation process, the testing data are used to test the model and classify it into four categories-PSD, control, SWEDD, and GENReg.

Proposed Model
This section describes the architecture of the proposed model. It starts with the insights of the dataset used in the proposed model. Since the data are in raw format, it is essential to apply it to pre-process raw data. The proposed model is trained on SPECT images from PPMI database [17]. To improve the robustness and prevent overfitting, a holdout set is also considered during training.

Dataset Description
The SPECT images are collected from PPMI [17] which is in digital imaging and communications in medicine (.dcm ) format. It provides other data such as gender, age, and the number of visits. The images have 128 × 128 resolution, with a nonuniform number of slices per image. All the images matrix have a 3-D shape. The images are labeled into four categories. They are: * PSD: SPECT images of person suffering from PSD. The number of images for this category is 68,164. There are 902 participants in this category, whose data are considered for the classification process.
* Control: SPECT images of person not suffering from PSD. The number of images for this category is 6480. There are 237 participants in this category, whose data are considered for the classification process. There are 619 participants in the combined category of SWEDD and GenReg PSD, whose data are considered for the classification process. The details regarding the race of the patient, their gender, and age distribution were elaborated on in [17]. Of the data, 75% is used for training and 15% of the data is used for validation purposes and 10% of the data is used for testing. Nearly 58,692 images are used for training and 11,738 images are used for validation, and 7826 images are used for testing.

Pre-Processing
In the data pre-processing step, the details of images and labels are collected from PPMI [17]. It provides the metadata in the form of a CSV file and a collection of directories containing images in DCM format. The authors used a pydicom library to read the data from the file. SPECT images have 128 × 128 resolution, with a non-uniform number of slices in a grayscale format. The dataset consists of 4 labels, where the number of images is much smaller for SWEDD, GenReg PSD, and Control categories. To balance the dataset, we used data augmentation and pre-processing techniques. The first step in image pre-processing is to flatten the stacked SPECT images, i.e., to convert a 3D brain cross-section image into an array of 2D images. After the flattening of 3D images, the pixel values are normalized in the range of 0 to 1 for faster convergence. For the SWEDD, GenReg PSD, and Control categories, intensive augmentation techniques are applied. The type of data augmentation is vertical flip, changing brightness from the range of 0.8 to 1.3, and image rotation with the range of 0 to 40 degrees, zoom with 0 to 0.3, and sheer range of 0 to 0.2. The data augmentation task is completed in the online mode, the augmented input image are not stored, but during the training session, the augmentation takes place for data amplification, and then the fine-tuning process is carried out. The data are fed into the model through ImageDataGenerator class of the TensorFlow library. To execute the data augmentation process, input data are segregated into a directory named train, test, and validation. All three directories contain four sub-directories representing every category. It reduces the inference time and complexity of the model. The algorithmic explanation for the pre-processing is elaborated in Algorithm 1. for β = 1, 2, . . . , Layer y do 9: ) D x is temporary array for storing images generated through augmentation 10: D y ← L[α] 11:

12:
P y α → append(D y ) 13: end for 14: 15: end for 16: (P x , P y ) returns the P x , P y values 17: end procedure Through the data augmentation process, for each input image, a maximum of 32 unique images are generated, which helps to improve the model's robustness towards the unseen conditions. Through the vertical flip augmentation technique, the model becomes robust towards the mirror image inputs, through the rotation and zooming of input images, the robustness of the model towards the orientation of input images is kept in check.
With the help of the shearing technique, the bloatedness of the image is considered during the training and through brightness, the spots where sudden change occurs can be inferred by the model during the fine-tuning session. After completing out the pre-processing and data amplification steps, the final number of images for SWEDD, GenReg, and Control categories are as follows: * Control: Initially, there are 6480 images for the control category after the amplification steps, and there are nearly 58,000 images for the control category.
* SWEDD: Initially, there are 3372 images for the SWEDD category after the amplification steps, and there are nearly 42,000 images for the SWEDD category.
* GenReg PSD: Initially, there are 240 images for the GenReg PSD category after the amplification steps, and there are nearly 15,000 images for the GenReg PSD category.

Model Motivation
Based on our exploration in Section 2, most pre-trained architecture such as VGG-Net, ResNet, DenseNet, and many more are utilized as the solution for Parkinson's disease classification. The imagenet competition-based architecture is quite complex as it has more than 50 layers. Due to the smaller size of the dataset, the pre-trained models tend to overfit during training. Thus, it does not provide a significant result during the model on validation and test dataset. This problem can be solved by using a large dataset. However, the complex architecture takes more computational resources and time to obtain the inference from the input. Thus, these models are not viable in the practical scenario in many places where the organizations have financial constraints. The proposed model aims to train a large data set with a CNN-based model which has nearly 14 layers, without compromising the results of performance parameters. After several implementations and tuning of hyperparameters, we applied and put forward the proposed model, helping us to achieve a huge performance boost within the defined constraints. Figure 3 presents the proposed architecture. The architecture is based on the convolution block, which helps to extract features from the collection of images. The input to the convolution layer is the tensor matrix of shape 128 × 128 × 1 (the last dimension shows that is is a gray scale image). The first layer is a convolution layer with 3 × 3 kernel size and 16 filters, followed by a max-pooling layer of 2 × 2 kernels with the same padding. After that, we have a convolution block of 32 filters, each with 3 × 3 kernels, followed by a standard max-pooling layer. Following, 2 convolution layers have 64 filters with 3 × 3 kernels and a standard max-pooling layer in each of them. We can extract the features from an output matrix. The activation function used in all layers is ReLU. The extracted features are shared with the dense layer, with 512 neurons with a ReLU activation function. It helps to classify the labels from the extracted features. The last layer consists of 4 neurons representing 4 labels for our classification problem, with softmax as an activation function. The whole classification model consists of 230,788 parameters, and all these parameters are trainable parameters. The model is trained using Adam optimizer with categorical cross-entropy as our loss function. The initial learning rate is 1 × 10 −4 . The callbacks are also added to obtain the best weights during training and testing to prevent the model from overfitting.

Performance Evaluation
This section evaluates the performance of the proposed model, which is then compared with different standard existing models used for the image-based dataset, such as AlexNet [15], GoogleNet [18], VGG-19 [19], ResNet [20], and DenseNet [21] for the classification task of PSD based on SPECT images. The proposed model is effective for the defined objective. The proposed model is trained using the TensorFlow framework [22] over the Python 3.8.0 platform. To evaluate the performance of the proposed model, the authors also used different metrics such as accuracy, precision, and recall. We considered precision, recall, and accuracy as its target attribute to be maximized. The formula of the metrics used for evaluation are as follows: Here, c 1 refers to category-1 from the defined list of categories. TP stands for true positive, which means that the model correctly predicts the category c 1 . Similarly, TN stands for true negative, which means that the model correctly predicts that the category is not c 1 . FP stands for false positive, which means that the model predicts the category c 1 , while that is not the case and belongs to the rest of the category. In the same way, FN stands for false negative, which means that the model predicts the category that does not belong to the category c 1 while that is not the case. Figure 4 gives insights into the accuracy of the training and validation dataset for the PSD classification task. The authors have trained the model with early callbacks that stop the model training when the improvement in the performance is not as per the predefined lower limit of the hyperparameters. At the end of the training, the model saves the best weights from all the performed epochs. These weights are selected based on the defined metrics and best combinations on the validation dataset. These kinds of stopping mechanisms help in using resources effectively and sometimes also help prevent overfitting. Here, the model's accuracy increases significantly up to the 35th epoch of training accuracy, while the validation accuracy presents a moderate change. The validation accuracy is less compared to the training accuracy, the reason behind this is the unbalanced data, the motivation behind the further training is to improve the model performance for the classes with a smaller amount of data.   Figure 5 shows the loss of model training and validation dataset. As seen in Figure 5, the loss for the training dataset is continuously decreasing while the loss of the validation dataset increases at a random pace. It shows a piece of evidence that the early callbacks help indirectly in the model's performance as after the 25th epochs, the loss for the validation dataset is nearly equal to the training loss that is good for the accurate predictions. The authors' objective is to identify whether the patient has PSD or not accurately based on the SPECT images. Furthermore, the author focuses on the model's result for the PSD class that presents the person having Parkinson's. Hence, the authors have compared the result of the proposed model for the PSD class with the AlexNet model, which is represented in the paper [13]. Figure 6 compares metrics such as precision, recall, and accuracy of the proposed model with AlexNet for the PSD class. The proposed model outperforms for all the metrics with the value of precision 0.9798, recall 0.9928, and accuracy is 0.9957, which is better, as compared to AlexNet. To provide strong evidence of the proposed model is to achieve better performance from the baseline. The comparison table (Table 3) of results compared with another dataset is given in [18]. This dataset contains two kinds of data-4-category (healthy, early, mid, and late PSD) and 6-category (healthy and HYS-1 to HYS-6), each of these datasets also containw two types of image-gray scale images and pseudocolor images. The performance of the standard models on the grayscale image of the four categories images can be compared with our proposed model's performance. This dataset overall contains 1010 SPECT images, out of which 30 images belong to the healthy category, 110 and 135, respectively, of HYS-1 and HYS-2 that belongs to the early stage, 265 images of HYS-3 that represents mid-stage and, finally, the remaining 435 and 35 images belong to HYS-4 and HYS-5, respectively, for late-stage. Simultaneously, the proposed model is also trained on the gray scale images distributed in four categories. To compare the performance with their dataset, the authors of this paper compared their input size and took the time to perform one epoch of training with the same batch size of 10. Table 3 compares the time taken in seconds by pre-trained deep CNN models [18] with their input size and number of layers. The proposed model is simple, yet effective, as with the comparatively large dataset of SPECT images, it takes less training time and performs significantly well. Table 4 compares the result obtained by the proposed model with the standard deep CNN model's discussed in [18]. The best result obtained in the paper [18] for accuracy, recall, and precision, is 0.825, 0.758, and 0.874 by AlexNet, VGG19, and AlexNet, respectively, for the gray scale images. The authors have shown the result achieved for the validation dataset for all three metrics. The proposed model outperforms all the models discussed in the paper [18] by obtaining an accuracy of 0.889, recall of 0.9012, precision of 0.9104, and F1-score of 0.9057. Considering these solids pieces of evidence, it can be said that the proposed model performs well compared to the existing models.

Conclusions
Classifying medical data is a difficult task and it needs to be performed precisely. The paper's objective is to approach one of the categories of medical field-related data and predict a disease accurately. The authors considered a SPECT-image-based dataset to classify Parkinson's disease. Several transfer-learning-based algorithms have shown promising results. However, they consume more resources and are complex models. The paper's main objective is to propose a simple, yet efficient, model that consumes less computational resources and performs well compared to existing models. In this paper, the authors consider a large dataset containing three PSD categories and one healthy control subject to train the model efficiently. The proposed model classifies the SPECT images into defined four categories with a precision score of 0.9104, a recall score of 0.9012, an accuracy of 0.889, and an F1-score of 0.9057. To provide further evidence of the proposed model's better performance, the authors compared the result obtained for the PSD category with that of AlexNet. In the future, the authors will further improve the performance of the PSD detection by using a hybrid model.