Application of Deep Learning Models for Automated Identification of Parkinson’s Disease: A Review (2011–2021)

Parkinson’s disease (PD) is the second most common neurodegenerative disorder affecting over 6 million people globally. Although there are symptomatic treatments that can increase the survivability of the disease, there are no curative treatments. The prevalence of PD and disability-adjusted life years continue to increase steadily, leading to a growing burden on patients, their families, society and the economy. Dopaminergic medications can significantly slow down the progression of PD when applied during the early stages. However, these treatments often become less effective with the disease progression. Early diagnosis of PD is crucial for immediate interventions so that the patients can remain self-sufficient for the longest period of time possible. Unfortunately, diagnoses are often late, due to factors such as a global shortage of neurologists skilled in early PD diagnosis. Computer-aided diagnostic (CAD) tools, based on artificial intelligence methods, that can perform automated diagnosis of PD, are gaining attention from healthcare services. In this review, we have identified 63 studies published between January 2011 and July 2021, that proposed deep learning models for an automated diagnosis of PD, using various types of modalities like brain analysis (SPECT, PET, MRI and EEG), and motion symptoms (gait, handwriting, speech and EMG). From these studies, we identify the best performing deep learning model reported for each modality and highlight the current limitations that are hindering the adoption of such CAD tools in healthcare. Finally, we propose new directions to further the studies on deep learning in the automated detection of PD, in the hopes of improving the utility, applicability and impact of such tools to improve early detection of PD globally.


Introduction
The purpose of this systematic review is to provide a comprehensive review of automated Parkinson's disease (PD) detection using deep learning models, and to further

Background
PD is an incurable neurological disease that results in progressive deterioration within the central nervous system and debilitating neurological symptoms [1]. The underlying cause of the neurodegeneration in PD is still partially understood, but key pathophysiological features are the gradual loss of dopaminergic neurons in a part of the midbrain known as substantia nigra pars compacta (SNpc), and the accumulation of misfolded alpha-synuclein protein in 'Lewy bodies' within the cytoplasm of neuronal cells in several different brain regions [2]. The dopaminergic pathway between the SNpc and the dorsal striatum, also known as the nigrostriatal pathway, is critical for movement control. Hence, disruption to the nigrostriatal pathway results in motor abnormalities in affected individuals with PD, including tremors, rigidity, and bradykinesia [3]. Affected individuals also experience non-motor symptoms, including constipation, depression, sleeping disorders, and reduction of smell [1,3].
Between 1990 and 2016, the number of people diagnosed with PD had doubled from 2.5 million to 6.1 million. This means the age-standardized prevalence rate increased by 21.7% [4]. Hence, PD is one of the most prevalent neurological disorders, with immense societal impacts, yet no curative treatments [5]. The gold standard treatment for PD is the dopamine precursor amino acid levodopa, which, in the initial stages of PD at least, can alleviate many motor symptoms by substituting for striatal dopamine loss [6]. However, its use can be complicated by the development of motor complications, including druginduced dyskinesias, and patients also have L-DOPA-resistant motor features including treatment-resistant tremor, postural instability, swallowing and speech disorders [2]. A range of modifications of dopaminergic treatments, as well as non-dopaminergic pharmacological therapies and non-pharmacological treatments such as deep brain stimulation, may be required over time. Rehabilitation and psychosocial supports are also key to try and maintain affected individuals' quality of life, and thus early diagnosis to allow instigation of expert multidisciplinary care is a key priority. Moreover, novel therapies that may actually modify the underlying disease processes are the goal of a large body of global research: it is likely that such advanced therapeutics, such as gene therapy, will need to be instigated as soon as possible in order to have maximal effect, as has been found to be the case for other degenerative conditions such as spinal muscular atrophy [7]. Therefore, early diagnosis is especially crucial in the optimal current and future management of PD, to ensure maximal functional outcomes for affected individuals.
At present, the diagnosis of PD is based on core clinical features, and the accuracy of clinical diagnosis can be improved by following standard clinical criteria, such as the UK Parkinson's Disease Society Brain Bank (UKPDSBB) [8], such as the presence of bradykinesia and absence of certain exclusion criteria. This clinical criteria rely on the expertise of a neurologist, but still are flawed: for example the diagnostic accuracy using the UKPDSBB, even in specialist neurology centres, is only slightly above 80%, compared to post-mortem pathological examination as gold standard [9]. Moreover, there is a global shortage of neurologists, especially in countries experiencing aging populations where there is a high frequency of neurological disorders [10]. This increases the waiting time for affected individuals to get diagnosed with PD. As a consequence, 60% of the dopaminergic neurons are typically lost by the time of diagnosis [2].
In efforts to meet the healthcare demands, there are interest in the possibility of using CAD tools based on artificial intelligence methods, namely machine learning (which potentially involves the more conventional pattern recognition approaches) or deep learning (which may involve sophisticated multi-layered neuronal systems), to perform an automated diagnosis of PD [11][12][13]. These CAD tools can perform automated detection using the biomarkers of PD, such as Electroencephalogram (EEG) signals, posture analysis in the gait cycle, voice aberration, or brain imaging such as Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) [14]. In a conventional machine learning model, it is mandatory to extract the features from the biomarkers and then select the most salient features in order to train the model [15][16][17][18][19]. This is a required step because machine learning models by itself are not capable of learning the high dimensional data in their raw forms, otherwise, the model is likely to overfit the dataset [20]. Also, the selection of the most relevant features must be carried out by an experienced expert system that is knowledgeable in terms of various feature selection tools [15,16]. This has led to the somewhat poor adoption of machine learning models as the future CAD tools as feature extraction and selection can be complicated procedures comprehensible by machine learning experts, but not so by the end-user of the CAD tool [21,22]. Such end-users may involve healthcare experts such as practicing clinicians, health researchers, or other domain applications.
Deep learning models, which are of increasing interest with big data and can resolve some of the limitations of machine learning models by eliminating the need for feature selection, feature extraction tools. Such models are capable of learning the high-dimensional data, and they may function analogously to the neurons in the human brain [23]. The conventional forms of machine learning models known as artificial neural networks (ANN) consist of three main layers: the input, the hidden, and the output layer as shown in Figure 1. All three layers within a neural network contain artificial neurons that are interconnected, as denoted by the black lines. As the neural network learns via a learning algorithm (e.g., backpropagation), the weights of the connection (black lines) between the neurons update iteratively [23]. The neurons, which act as an individual classifier, determines the output signal after processing the weights from its previous connections [23]. When an ANN model has been constructed into an architecture that has more than one hidden layer, the system is then known as deep neural networks (DNN), and such systems are then capable of learning the data with a higher degree of complexity [23] (Figure 1). In deep learning algorithms, there are often other classes of model, such as CNN, recurrent neural network (RNN), and LSTM, that utilize DNN as their basic principal architecture.

Convolutional Neural Network (CNN)
In any CNN model, the input layer of a typical DNN model is replaced by a series of convolutional and pooling layers, as also shown in Figure 2. If DNN is described as the neurons in our brain, then the CNN may be considered as the human visual system [24]. The first convolutional layer contain numerous filters which extract features from the input image to generate multiple feature maps. The subsequent pooling and convolutional layers reduce the dimension of the feature maps and further enhance the features, thereby reducing the complexity of the feature map and the likelihood of overfitting [25]. This could be considered as analogous to the human visual system, where the visual cortex attempts to break down images into simpler representations for the brain to perceive the image with ease [24]. After the final pooling layer, the feature maps are converted into single-list vectors at the flatten layer ( Figure 2). The neurons in the neural networks, also known as the fully connected layers, will then learn to recognize the features from the single-list vectors and perform image classifications [25]. Hence, CNN models are known for their exemplary image recognition ability, which many studies have successfully demonstrated the success of CNN in medical imaging, including the recognition of breast tumors, and eye diseases using mammogram and color fundus images, respectively [26]. Apart from medical images, CNN has also demonstrated success in biometric face recognition systems for human tracking purposes [27,28].

Long Short-Term Memory (LSTM)
The LSTM model is an improvement from its predecessor methods known as RNN [16]. Just like its name suggests, the LSTM model attempts to mimic how the brain stores memories and makes predictions based on immediate past events stored in the memories [24]. Both the RNN and the LSTM models are known for their ability to recognize patterns in sequential data [16]. However, the vanishing gradient has often been a very common problem in RNN models, where a large information gap exists between the new and old data, causing erroneous signals to vanish during the model's training phase. As a result, the RNN model is not able to learn the data that has long-term dependencies. Hence, the LSTM model has been developed to resolve the problems of vanishing gradient in RNN models [29].
The neurons in a typical LSTM model adopt a unique gate structure [30] denoted as the forget gate, input gate, and output gates ( Figure 3). The input gate decides if the new information (x t ) should be stored in the cell, the output gate decides what information should output as the hidden state (h t ), and the key to eliminating the vanishing gradient problem lies in the forget gate [30,31]. The sigmoid (σ) function in the forget gate is used to deduce if the information brought from the previous cell state (C t−1 ) should be kept or forgotten, thereby removing irrelevant data, and reset the information in the cell appropriately [30,31]. This prevents large discrepancies between the old and new information that will eventually lead to vanishing gradient problems. In addition, useful information continuously gets backpropagated in the LSTM model, allowing it to memorize patterns in long-term dependencies [30,31]. Hence, the strong pattern recognition ability of LSTM models is widely implemented in applications such as speech and handwriting recognition [32,33]. LSTM models are also suitable in forecasting stock prices in financial markets which are dynamic and non-linear in nature [34,35].

Materials and Methods
This systematic review applied the PRISMA model [36] to analyze the most relevant studies on PD detection using deep learning models from the period January 2011 to July 2021. All the resources were systematically searched through PubMed, Google Scholar, IEEE, and Science Direct using the Boolean search strings, as shown in Table 1. A total number of 794 studies that contained these Boolean search strings were identified, which also included 178 studies from PubMed, 248 studies from Google Scholar, 135 studies from IEEE, and 233 studies from Science Direct. From the 794 articles initially identified, a total of 110 duplicate studies were removed. After this, a total of 612 articles (61 traditional Machine Learning studies, one non-human study, 104 conference papers, 402 Non-CAD for PD studies, 14 irrelevant studies, 14 non-English articles, and 16 books) were also excluded according to their relevance with this review. Eight studies were further removed from the list as they did not provide model-accuracy results. The final number of research studies that qualified for inclusion in this review was set to 63. Figure 4 shows a detailed process of the PRISMA method in the selection of the most relevant articles. PubMed "parkinson" AND "disease" "Neural network" 178 "Deep learning" Google Scholar "Prediction" OR "Diagnosis" OR "Detection" 248 IEEE "Neural network" 135 "Deep learning" Science direct "Neural network" 233 "Deep learning"

Results
There are two parts to this section. Section 3.1 Brain analysis covers 23 deep learning studies performed on Single Photon Emission Computed Tomography (SPECT), PET, MRI, ultrasound, and EEG. Section 3.2 Motor symptoms covers 40 deep learning studies performed on gait, handwriting, speech, Electromyogram (EMG), and other movementrelated tests. The details of the deep learning studies under brain analysis and motor symptoms categories are in Appendix A Tables A1 and A2, respectively.  A majority of the studies that focused on image analysis proposed CNN models for an automated detection of PD ( Figure 5). For the case of SPECT imaging, the highest performing CNN model was developed by the study of Choi et al. [37], which had evaluated their proposed model (i.e., PD net) with two datasets: the PPMI dataset, which obtained an accuracy of 96%, and a private dataset (SNUH cohort) with an accuracy of 98.8% (Figure 7, Appendix A Table A1). Both results exceeded the performance of two human raters whose accuracies were 90.7% and 84% each for the PPMI dataset. There was only one study by Ozsahin et al. [40] that has proposed a back-propagation neural network (BPNN), which achieved the highest model accuracy of 99.6% using the binarized image of SPECT images ( Figure 7, Appendix A Table A1). However, the applicability of the CNN model has been advocated in a majority of studies in SPECT imaging ( Figure 5). In any event, we aver that for practical and ethical purposes, the suitability of the CNN or the BPNN model for SPECT imaging should still be assessed via clinical trials. As for the PET and the MRI study cases, we note that the highest performing CNN model was 93% [41] and 95.3% [42], respectively ( Figure 7, Appendix A Table A1).

Brain Analysis
To date, only the study of Shen et al. [43] had attempted to use ultrasound, namely transcranial sonography (TCS) images for automated PD detection (Appendix A Table A1). They proposed a deep learning model known as Multiple kernel mapping-broad learning system (MEKM-BLS) that has a wider feature and enhancement node/neurons than a typical DNN model. This method has the ability to map the features from the feature node directly onto the enhancement node. However, their model only achieved an accuracy of 78.4%, lower than that of MRI, PET and SPECT. Nonetheless, ultrasonography has several advantages such as low cost, fast, and does not have radiation exposure [44]. Furthermore, a study by Mehnert et al. [44] demonstrated that interpretation of TCS for PD diagnosis can reach a sensitivity score of 95% by experienced sonographers. Hence, there is room for improvement for ultrasonography in automated PD detection, and future work to implement CNN models for the interpretation of TCS images should be considered. Apart from brain imaging issues, the physiological signals such as the EEG can also reflect brain abnormalities that are unique to the prevalence of PD [45]. This aspect has been reported, particularly that the EEG frequency of a PD patient is abnormally slow, compared to that of a healthy individual [46]. In this review, we have found 6 studies that had proposed deep learning models to recognize EEG characteristics for automated detection of PD. Nearly half of these studies proposed the use of the CNN model [25,47,48], and the remaining three studies had proposed the application of an RNN [49], DNN [50], and a hybrid deep learning model that combines CNN and RNN algorithms [51] ( Figure 5). The bestperforming model was developed by Khare et al. [47], who has also proposed a CNN model with smoothed pseudo-Wigner Ville distribution (SPWVD) features from EEG signals as an input, and further obtained an accuracy near 100% (Figure 7, Appendix A Table A1). This shows that CNN models are likely to achieve a high classification accuracy for onedimensional data such as EEG signals. Like the data of ultrasound tests, the EEG data are somewhat cheaper and offer a low-risk alternative to the MRI, PET, and SPECT datasets, but unlike ultrasound, the overall accuracy of studies that implemented EEG signals (95.8%) is on par with studies that have used SPECT images (94.1%) ( Figure 6).

Motor Symptoms
Since PD is characterized by involuntary motor control, an assessment of motor can be utilized for the diagnosis of PD. Such assessments could include gait, handwriting, speech, and other movement-related tests as illustrated in Figure 8. In principle, Gait refers to the walking patterns of an individual. In the case of PD, the body's stiffness and postural instability may worsen as the disease progresses, leading to gait disturbance [52]. In this respect, the gait features can be utilized to train deep learning models in the detection of PD. The key features of gait include kinetic features such as ground reaction force (GRF) and kinematics features such as stance and swing phase of the foot [52]. There are currently 11 deep learning studies that have attempted to analyze the gait for PD detection, and a wide variety of deep learning models have thus been proposed ( Figure 9, Appendix A Table A2). Among them, two studies while proposing a set of hybrid models by combining the CNN and LSTM model have achieved a high overall accuracy [53,54] (Figure 9). The best-performing hybrid CNN-LSTM model was also proposed by Xia et al. [53], using vertical GRF at multiple points of time during the gait cycle. The idea of implementing a hybrid CNN-LSTM model for gait analysis is to have the CNN layer extract the salient gait features, and the LSTM layer to analyze the temporal pattern of the gait features in a walking cycle. As a result, Xia et al. [53] achieved the highest model accuracy of 99.1% (Figure 9, Appendix A Table A2), using a dataset that came from three research groups: [55][56][57]. Similarly, two other studies that had proposed DNN [58] and LSTM [59] model also achieved high-performance results that are on par with the CNN-LSTM model (Figure 9, Appendix A Table A2). Hence, future deep learning studies based on gait analysis could focus on the development and implementation of these three models.
The deterioration of handwriting ability is another telltale symptom of PD, and this is often seen in a majority of PD patients but is not included as a diagnostic criterion of PD [60]. A PD patient may exhibit abnormally small handwriting, termed micrographia, due to rigidity and tremors in the writing arm [61]. Thirteen studies on deep learning algorithms have attempted to diagnose PD using handwritten drawings with one of the three common PD handwriting datasets: PaHaW dataset [62], HandPD [63], and NewHandPD [64]. All three datasets involve a series of drawing and writing tests, and one of the common tests that exist in all three datasets is the spiral drawing test. Similar to the brain imaging, most studies had proposed using CNN models to differentiate handwritten drawings of PD patients from healthy controls ( Figure 10). The best performance was achieved by Kamran et al. [65] who has tested the six common transfer learning architecture of CNN, namely AlexNet [66], GoogleNet [67], VGGNet-16/19 [68], and ResNet-50/101 [69]. These transfer learning models have been previously trained using a well-known image dataset known as ImageNet which consists of more than 1 million images. Kamran et al. [65] then fine-tuned the transfer learning models to adapt to the handwritten drawings of PD and healthy controls, and the highest model accuracy was achieved by AlexNet [66] with 99.22% (Figure 10, Appendix A Table A2).  Only two studies have to far attempted to use a small-scale movement-related test like swallowing [70] and finger tapping [71] (Appendix A Table A2). These two studies had proposed different deep learning models each, and the best performance of 82.3% was achieved by Jones et al. [70], using an ANN model with video-fluoroscopic and manometric data collected from the boluses which were delivered to the subject's oral cavity using a syringe. Videofluoroscopic data includes information like laryngeal, hyoid, and epiglottic movement, while manometric data includes information such as rise time and rate of the velopharynx and mesopharynx.
Besides the visible movement disorder, the muscle control of speech is also affected in PD [72]. As a consequence, people with PD will experience voice abnormalities such as lower voice volume and slurred speech [72]. There are currently twelve studies that had attempted to use voice aberration to diagnose PD (Figure 11, Appendix A Table A2). A wide variety of deep learning models were proposed with half of these studies being on CNN models ( Figure 11). Two of the CNN models were seen to achieve a high model accuracy of 99.5% [73] and 99.4% [74] (Figure 11, Appendix A Table A2). However, the best performing model was developed by Ali et al. [75] who proposed a genetically optimized neural network (GONN) with a model accuracy of 100% (Figure 11, Appendix A Table A2). At present, more studies had supported CNN model for speech analysis. Nonetheless, it should be noted that clinical trials are required to further justify if GONN or CNN is a better alternative for speech analysis. Like the analysis of the brain, motor symptoms of PD can also be assessed by physiological signals, namely EMG. However, only one deep learning study has attempted to use EMG for PD diagnosis with the ANN model [76], and the performance of their proposed model was 71%, less than that of the studies that focused on gait, handwriting, and speech (Appendix A Table A2). Hence, for EMG to be recognized as a potential biomarker for PD diagnosis, more research in this area is required. Otherwise, datasets such as handwriting and speech recordings, which have easier data collection procedures, are better alternatives than EMG.
Lastly, two studies did not limit themselves to only one type of modality (Appendix A  Table A2). The study of Vasquez-Correa et al. [77] used three input signals-speech, handwriting, and gait-for multimodel analysis of PD using the CNN model and achieved 97.6% accuracy. Oung et al. [78] used two input signals based on speech and motion data derived from wearable sensors to propose an extreme learning machine (ELM) for the detection of PD. Their ELM model architecture is similar to an ANN model whereby there is only one hidden layer in its network but the training process of an ELM differs from the ANN model. Basically, the ELM model only requires a single iteration for model training through a random selection of the most optimal hidden neurons, which results in a much faster training time and a lesser overfitting problem compared with the ANN model [79]. The model accuracy of ELM obtained by the study of Oung et al. [78] was 95.9%, and this figure is comparable to the accuracy of the CNN model proposed by Vasquez-Correa et al. [77] (Appendix A Table A2). Based on a synthesis of these information, we conclude that deep learning models that are also capable of multimodel analysis of PD, may be a useful practical tool for neurologists. In the future, as more clinical information and particularly the detailed and correctly labelled electronic datasets are available, deep learning models may further aid in the diagnosis of PD. Hence, future studies on deep learning should perhaps consider using multiple types of input signals for PD detection, instead of relying on just a single modality.

Discussion
There are five parts to this section. Section 4.1 provides the summary of results gathered from the previous section. Section 4.2 discusses the challenges that are affecting the adoption of CAD in healthcare. Section 4.3 provides solutions to tackle the challenges highlighted in Sections 4.2 and 4.4 describes the future vision of the CAD tool in the diagnosis of PD with Section 4.5 listing down the limitations of this review.

Result Summary
The application of deep learning models as a CAD tool for automated diagnosis of PD have been gaining popularity over many years. From Figure 12, the number of deep learning studies as of July 2021 has reached 12, which is more than half of the studies in 2020 (18 studies). Hence, it is very likely that the number of studies by the end of 2021 will exceed that of 2020. Every year, the number of deep learning studies bases on motor symptoms exceed that of brain analysis ( Figure 12). This might be due to the ease of data acquisition for motor symptoms as the collection of data is less complicated than brain analysis and most of the datasets are publicly available. The overall model performance achieved by deep learning studies in each modality is favorable, especially for common modalities like MRI, PET, SPECT, EEG, gait, handwriting, and speech, which overall model accuracy had all exceeded 80% ( Figure 13).  This review underscores the following key aspects of the current deep learning studies for automated PD diagnosis: • Deep learning models proposed by various studies have achieved a high predictive accuracy for the diagnosis of PD ( Figure 13). • About 57% of the deep learning studies for automated PD detection had proposed using the CNN model ( Figure 14). • CNN models have demonstrated to have high prediction accuracy for image classification such as brain imaging (SPECT, PET, and MRI), and handwriting recognition.

•
Our results have also shown that CNN has good performance in detecting abnormalities from one-dimensional signals like EEG [47] and speech [73]. • Gait analysis, on the other hand, seems to perform better with either hybrid model (CNN-LSTM), DNN, or LSTM model. However, more research is required to determine the best-performing model. • Apart from CNN model, Ozsahin et al. [40] and Ali et al. [75] proposed BPNN and GONN for SPECT and speech analysis respectively and obtained the highest prediction accuracy.

•
However clinical trials are required to prove the suitability of the proposed deep learning model for each modality.

Challenges Faced by CAD Tools in Healthcare Adoption
Despite the high prediction accuracy obtained by many deep learning models proposed in various automated PD detection studies, the adoption of the deep learning model as a CAD tool in healthcare is currently not supported [21,22]. In their current form, neither neurologists nor other healthcare workers are comfortable to rely on CAD tools to diagnose the PD. This is due to several challenges as listed below: • Lack of standards The diagnosis of PD have been reliant on clinical features for several years, and neurologist have been trained to recognize the sets of clinical features to determine a diagnosis [8]. For instance, the diagnosis criteria provided by UKPDSBB (i.e., presence of bradykinesia and absence of certain exclusion criteria), is not adopted by current deep learning, and even machine learning studies. Instead, a majority of the deep learning studies in this review have focused on only one modality instead of adopting a multimodal approach, which is not practical for clinical use. Deep learning models also do not recognize the features of PD the same way as a human neurologist would do. For example, deep learning models can detect PD from brain imaging by means of a vectorized image instead of a clinical feature, which does not follow the existing diagnosis criteria [80]. Hence, neurologists may be too hesitant to use the CAD tools which greatly deviates from their comfort zone or does not provide a clinically trusted artificial intelligence framework that is also explainable and interpretable for future clinical practice purposes.

•
Poor interpretability Deep learning models are also known as the 'black box' so it is almost impossible to clearly understand the mechanisms behind a deep learning model when it makes a given prediction [22,23]. Despite achieving high prediction accuracy, end-users of the CAD tools (e.g., neurologists and healthcare workers) cannot make a diagnosis without sufficient evidence, and this evidence is not currently provided by deep learning models [21,23]. Hence, neurologists are not able to trust the CAD tools as they cannot afford to make a diagnosis without concrete evidence, explainability and interpretability of the somewhat black box style method used to produce an outcome.

• Psychological barriers
In healthcare industry, human behavior must always be considered when designing a CAD tool for a target consumer audience. The common psychological barriers that are affecting the adoption of new technologies are the endowment effect and the status quo bias. The endowment effect is where an individual values their possessions higher than their original market value [81] whereas the status quo bias is the preference of an individual to remain in their comfort zone and maintain their environment in the same state [82]. Both of these emotional biases are likely to cause an individual, neurologist, for example, to feel a significant sense of loss when they switch from manual diagnosis to relying on CAD tool for diagnosis.
There are many other factors such as the difficulty of obtaining regulatory approval and poor interoperability, which refers to the ability to communicate between two systems [22]. For example, if two hospitals used different electronic health systems, the data from these two hospitals may not be coherent and might not communicate with each other. These two concerns, however, should come after a prototype for the CAD tool has been de-veloped. For instance, a developer must first develop a working prototype before applying for the necessary International Organization for Standardization certifications. At present, research on using the deep learning model as a CAD tool has yet to attract end-users, and to further convince them to support the implementation of CAD tools in healthcare systems. As such, researchers must tackle the three main challenges listed above and improve the versatility of existing deep learning models. Only when the end-users are satisfied with the outcome (i.e., explainability) and the benefits (i.e., accuracy of feature extraction) of the CAD tool, they may become more willing to support the adoption of the CAD tool in healthcare. In the absence of this perceived requirement, research into a CAD-based tool for automated detection of PD and even some of the other diseases may continue to result in the 'valley of death', where applied research accumulates without being translated into real clinical practice. This can leading to a widening of the gap between applied research and translation of its benefits into clinical practice [83].

Solutions to Promote Adoption of CAD
Moving forward with an aim to translate the potential benefits of deep learning methods into future clinical practices, researchers and end-users need to better understand that the CAD-based tool should not position itself to replace an end-user's role in diagnosing the disease. This is a common misunderstanding as deep learning and machine learning studies often claim high success of their proposed models with the absence of end-user involvement. Consequently, a false notion of CAD tool replacing the end-users is created. Therefore, the CAD tool should aim to provide alternatives and better opinions in the diagnosis of disease for the end-users to consider, thereby increasing the end-user's confidence and used for reducing errors simultaneously. The adoption of CAD tool, hence, should improve the efficiency of clinical diagnosis and to further help predict the possible disease and identify alternative treatment options for end-users like clinicians to consider in their days to day work. However, it appears too often that both deep learning and machine learning models do not provide additional information other than their predicted results so this may not be helpful to the end-users as a futuristic prediction tool that is not supported by visible clinical features, nor by detailed explanation of how it arrived at the results. Hence, the authors of future deep learning studies used for automated PD detection, and also for the other disease should include visual cues, such as segmentation as an explanatory function in their deep learning architecture. An example of the workflow process that we propose for a practical CAD tool is illustrated in Figure 15.
In Figure 15, we present two alternatives. The first alternative is to configure a deep learning model that can perform the diagnosis (i.e., identification of the ailment) and segmentation (i.e., explanation, or detailed information) simultaneously. The second alternative is to perform diagnosis in the first stage, and in the second stage, segmentation is performed only on the input image or signal that had been diagnosed as PD in the first stage. In either case, it will be useful to provide additional information like the time frame for abnormal physiological signals, striatal volume, and percentage of dopaminergic neurons lost for image analysis. Also, deep learning models and even machine learning models are comprised of complicated algorithms that neurologists may not necessarily understand. Hence, visual cues could make up for the poor interpretability of deep learning models by allowing neurologists to 'see' what has been identified as abnormalities by the model.
The provision of visual cues may greatly contribute to the acceptance of CAD tools in healthcare. Looking at the behavioral trade-off matrix in Figure 16, innovation products are known to fall in either one of the categories [84]. At present, neurologists rely on clinical features and visual inspection to diagnose PD. However, the deep learning studies gathered in this review developed models with high prediction accuracy, but not accompanied with evidence-based diagnosis. Hence, this results in a large degree of behavioral and product change, as neurologists will have to forgo evidence-based diagnosis if they switch from visual inspection to rely on CAD tools for PD diagnosis. As a consequence, the current deep learning models developed by various study in this review falls in the 'Sure failures' category in Figure 16, discouraging its adoption into healthcare. The inclusion of visual cues in the deep learning model, thus, decreases the degree of behavioral change to 'low' as the deep learning models had segmented the brain abnormalities for the neurologist to inspect the brain images with greater ease. Also, this will greatly boost the neurologist's confidence in deep learning models, especially when their prediction coincides with the CAD tool. Therefore, the inclusion of visual cues as a function may allow deep learningbased CAD tools to switch from the 'Sure failures' category to 'Smash hits', which greatly encourage the adoption of CAD tools and ensures the long-term and short-term success of an innovative product [84].

Solutions to Promote Adoption of CAD
With the acceptance of the CAD-based tool, the authors hope to alleviate the manualized work burden of neurologists and other healthcare workers. As such, individuals affected by PD can also play a part by performing self-assessment with the aid of a CAD tool. This could also encourage individuals to seek professional help when the CAD tool predicted a positive on PD and urge that medical attention is required. Figure 17 is an example of a cloud-based CAD tool in which data can be assessed by any electronic device with access to the internet like smartphones and computers. An individual who suspects that they may have PD can either use their smartphone to conduct handwriting test, voice recording to detect speech aberration, or take a video of their walking cycle to perform gait analysis. These recorded pieces of evidence are useful information for the neurologist to confirm a diagnosis, which helps to increase efficiency and reduce the waiting time for diagnosis. In addition, handwriting, speech, and gait analysis are potential telemonitoring alternatives. Brain imaging like SPECT, PET, and MRI is heavy machinery that is not practical to be placed at home. Recording devices to monitor physiological signals like EEG and EMG are not common possessions in today's households either. Hence, it is more practical to monitor PD progression thru a smartphone that has built-in handwriting, speech, and video recording function. In this review, the authors have only demonstrated that deep learning models are promising CAD tools for PD diagnosis. However, a practical CAD tool should ideally be able to identify multiple diseases instead of PD alone. Hence, we hope deep learning studies for other neurological diseases could also heed our advice and include visual cues as a function in their system. As such, we can develop deep learning models into a clinically trusted CAD tool for clinical decision support. Thereby taking deep learning models a step further into adoption in healthcare and enter a new phase of application in the health informatics industry.

Limitation of This Study
In spite of major contributions made through a detailed synthesis of the most relevant information on deep learning methods for clinical diagnosis purposes, this review comes with some limitations, as follows.

•
Deep learning studies for each modality (MRI, EEG, speech, etc.,), may use different datasets to train their model. For example, studies interested in MRI may use a private dataset instead of the public dataset, PPMI. Hence, it could become rather difficult to compare the performance of two deep learning models that do not train with the same dataset.

•
There is a potential lack of studies for ultrasound imaging, small movement-related tests, and multi-model analysis which involves more than one modality. This makes it difficult to determine the best-performing model for these three categories.

•
The wide variety of deep learning models proposed for gait analysis also makes it challenging to determine the best performing model, hence, it is difficult to decide between the top three best performing models: CNN-LSTM, DNN, and LSTM.

Conclusions
PD requires early diagnosis and intervention to minimize the impact of this degenerative condition and ensure that affected individuals can remain self-sufficient as long as possible. However, the imprecise nature of clinical diagnoses, and a lack of neurologists expert in PD diagnosis worldwide, often results in delayed diagnosis and suboptimal management of PD. Moreover, the likely success of advanced therapeutics such as gene therapy, currently under development, will be heavily influenced by early diagnosis. Thus, a CAD tools based on deep learning models should be considered to alleviate the work burden of neurologists if they can perform fast and accurate PD diagnoses. In this study, we have reviewed 63 studies on deep learning for various modalities such as brain analysis (SPECT, PET, MRI, and EEG) and motion symptoms (gait, handwriting, speech, EMG). We show that deep learning models can achieve high prediction accuracy for PD, especially the CNN model that is widely proposed by studies that had focused on image classification for brain imaging and handwriting analysis. The CNN model also performed well in onedimensional signals like EEG and speech analysis. However, deep learning models have yet to be supported by end-users such as neurologists and other clinicians due to a lack of evidence regarding disease prediction. Hence, this review aims to propose new solutions for future deep learning studies, and perhaps the inclusion of visual cues, such as the segmentation of abnormal areas, as a function in the deep learning model architecture. We also urge that researchers continue to build deep learning models with specific applications to some of the other disease detection problems and include visual cues in their model. It is hoped that researchers will be encouraged to adopt more explainable and interpretable methods in deep learning-based CAD tools, which can then be taken up by the end-users, and improve the health care outcomes for a growing number of individuals affected by PD worldwide.