1. Introduction
Parkinson’s Disease (PD) is the second most common neurodegenerative disease, affecting over 6 million people. It affects movement, cognition, the autonomic nervous system, and causes neuropsychiatry. Clinical interest focuses on disturbances of movement because they cause significant disability, they can be treated, and they dominate the early years after diagnosis. The key diagnostic feature of PD is the slowness of movement, known as bradykinesia, and this is caused by a loss of dopamine transmission in the striatum. Bradykinesia can be treated effectively by dopaminergic medications including levodopa. The first few years of Parkinson’s Disease (PD) respond well [
1,
2] but the duration of symptomatic benefit derived from each levodopa dose begins to shorten in ~50% of people with PD (PwP) after two years of disease and ~70% of PwP eventually experience this loss of therapeutic benefit, known as “wearing-off” or “off” periods or fluctuations [
3,
4]. At about the same time, involuntary movements at the peak of the levodopa dose, known as dyskinesia, begin to emerge. Much of the therapeutic effort is directed at reducing these fluctuations between bradykinesia and dyskinesia as they lead to disability and impaired quality of life. Initially, this can be achieved by adjusting oral therapies but device-assisted therapies (DAT) such as deep brain stimulation provides superior results in many PwP.
Despite broad consensus as to the criteria for selecting DAT candidates [
5,
6], non-specialists have difficulty in recognizing these criteria. Many PwP in whom fluctuations are emerging are managed by non-specialists and consequently, suitable DAT candidates are not referred in a timely manner [
7]. For example, the timing for deep brain stimulation (DBS) is important because there is a window of optimum benefit [
8,
9], and delay means that suitable candidates may have shorter benefit from DBS or worse still, miss out entirely. As many as 67% of patients referred for DBS are unsuitable for the procedure [
5,
10] yet only 1% of people with PD receive DBS [
11], even though as many as 20% may, in fact, be eligible [
6]. One reason is that fluctuations, a key indication for DBS and other DAT [
12], are frequently overlooked by both patient and clinician [
13,
14,
15]. The motor indication for consideration of any DAT is similar [
16], consisting of troublesome periods of bradykinesia (“off” periods) or dyskinesia that cannot be addressed by optimal deployment of oral therapies. Responsiveness to levodopa is important, but often an increased number of daily levodopa doses are required. Unquestionably, age and cognition influence the type and threshold for DAT, but these should be addressed at the specialist referral center rather than a reason to delay referral when oral therapies do not address troublesome “off-times” and or dyskinesia. Thus, a screening aid could have a role in ensuring that suitable candidates are referred to specialist centers for full consideration of all the factors that influence suitability for DAT without burdening these centers with too many unsuitable cases [
5,
10]. Our interest in this study was whether a classifier that provided this screening aid could be built using recently developed wearable sensors for PD [
17,
18]. The assessment of PwP, including suitability for DAT, is currently based entirely on clinical skills. The recent developments in objective measurements of the motor features of PD [
17,
18] raise the possibility of using data from wearable sensors to build an instrumented classifier to assist in the detection of DAT candidates. To our knowledge, our previous pilot study has been the only attempt to do this [
19].
The Parkinson’s KinetiGraph (PKG), described further in the method section, is a wearable sensor system that provides objective scores of the motor features of PD, including bradykinesia, dyskinesia, and fluctuations. Preliminary data suggest that it substantially improves the recognition of fluctuations [
15]. This should lend itself to a machine learning approach to recognize DAT candidates. In a previous pilot study [
19], 36 people with PD (PwP) were classified on motor grounds as either being DBS candidates or as unsuitable candidates by clinicians who were expert in DBS. The information from the PKG obtained at the time of classification was used to model this decision and build a predictive score [
19]. Although this score had high sensitivity and specificity on its original training set, it was not formally tested in a re-test cohort. Furthermore, only small differences in the score separated people who were DBS candidates on motor grounds from those that were unsuitable. A score that addressed these issues by providing a likelihood or risk (in a statistical sense) of requiring DAT might better address the needs of the non-specialist referrer.
In the study reported here, a new score (a DAT classifier score) that predicted the likelihood of a PwP being a DAT candidate was developed. Establishing whether a PwP should be considered for DAT (or not) is a classification problem where the benchmark is the opinion of the expert clinician. There are well established processes for building classifiers that automate human decision making. We outline the steps here to aid the reader who is less familiar with these processes and to foreshadow the results in this paper.
The first step in building a classifier is to identify construction and test sets (of PwP in our case). The construction set was used to train, develop, cross-validate, and evaluate the performance of the fully specified classifier, which was then re-tested on the test set. Ideally, the construction and test sets should be randomly selected from the same integral population of PwP. The second key step is the choice and refinement of PKG variables. While there are many options, we chose intuitively selected candidate PKG variables, based on information from the literature (see reference [
16] for a review) and from experience gained in developing the earlier classifier described above [
19]. We then used statistical methods (joint mutual information) to determine which of these parameters were most important in carrying information reflected in the classification of a subject as meeting the criteria for DAT (criteria positive (CP)) or not meeting the criteria for DAT (criteria negative (CN)). This shortened the list of candidate PKG variables to those containing the most non-redundant relevant information and descriptive information about the CP and CN classes. The next step was to build a model that used these parameters to predict the clinical classification. The accuracy of the model was assessed using the technique of cross validation, which uses sub-samples of the construction set to assess the accuracy of the prediction using the area under a receiver operating characteristic (ROC) curve. For the design of the classification model pipeline, k-fold cross validation was performed using the construction set, and the area under the curve of the receiver operating characteristic (ROC) was used as the performance criteria. ROC is an appropriate measure here because the classes are balanced. Different elements throughout the pipeline were iteratively modified until the best performance on the ROC curve was obtained (judged by AUC and sensitivity vs. specificity). Only after the performance of the model was optimal on the construction set (described in detail in the Results section), it was tested on the separate test set with the expectation of achieving low variance suggesting generalizability of the model to any unseen data.
When this stage was reached, we could say at one level, that the classifier had been validated. However, further steps were required to obtain insights into the clinical validity and limitations of the classifier. Any classifier will have errors and before using it in clinical decision making, the nature of these errors should be understood. Thus, there is value in examining cases that were either incorrectly overlooked as DAT candidates or incorrectly identified as DAT candidates. Understanding the reasons for false negatives and positives not only informs clinicians in using the classifier but also aids the development of the classifier in the future. Conceptually, a PwP with excessive periods of bradykinesia or dyskinesia would be classified (by the classifier) as “suitable” for DAT, yet doesn’t address the question of whether the excessive periods of bradykinesia or dyskinesia can be resolved by manipulation of oral therapies or whether DAT is required. This is encapsulated by the general commentary that DAT is recommended when there are excessive periods of bradykinesia that cannot be addressed by manipulating oral therapies. The findings suggest that in the likely real-world practice, a clinician using a support algorithm would recommend DAT once they were confident that the score could not be improved by manipulating oral therapy. We tested this by examining the management of a population of PWP to see how many PwP with excessive periods of bradykinesia or dyskinesia that would otherwise indicate suitability for DAT, could be improved by manipulating oral therapies. The DAT classifier score produced using a machine learning program shows promise as a tool to guide clinicians as to when PwP have reached a point when referral to an expert center is timely.
5. Conclusions
PD initially presents with bradykinesia, which is relatively simple to treat in the first few years. After that time, management becomes challenging as the fluctuations between bradykinesia and dyskinesia with each dose and shortening of dose effect to approximately three hours. While DAT is among the most effective means for managing this stage of PD, many PwP for whom this therapy would be appropriate, miss out because their managing clinician fails to recognize the indications. The PKG system was used because it provides objective measures of severity of bradykinesia and dyskinesia, time “off”, PTD, and frequency of dosing, which are the same measures that clinicians extract by history to establish whether there are excessive periods of bradykinesia or dyskinesia that cannot be reduced by manipulating oral therapies. Thus, it was likely that it would provide input features that could be used to build a DAT classifier score of the likelihood that a PwP meets the motor indications for DAT as determined by clinical classification. The main findings of the study were as follows.
The information from the PKG could be used to build a classifier that identified with high sensitivity and specificity, PwP who specialist clinicians had identified as meeting the criteria for DAT from those that did not. Thus, the DAT classifier score was successful in identifying PwP who met the first criterion for DAT suitability: Having excessive periods of bradykinesia and/or dyskinesia achieves.
The DAT classifier score correctly assigned subjects to DBS in 87% of cases who had already been preselected for surgery. The remaining miss-assigned cases may not have been considered by all specialists as suitable cases. This is in keeping with the current discussion in the movement disorder specialty around how early in the disease DBS is a suitable therapy.
Cases where excessive periods of bradykinesia or dyskinesia could not be corrected by oral therapy were PwP who met the second criterion for DAT: That is, excessive periods of bradykinesia and/or dyskinesia could not be reduced by manipulating oral therapies. The DAT classifier met this second criterion because the scores remained high despite efforts in using oral therapy to optimize treatment. Thus, a clinician using the DAT classifier scores score would have correctly identified these subjects as eligible for DAT.
Using an effective DAT classifier score to measure PwP from diagnosis to the onset of excessive periods of bradykinesia or dyskinesia that could not be corrected by oral therapy, which should see a commensurate change in the DAT classifier score as a movement disorder specialist was more likely to consider introducing DAT. The DAT classifier performed well under these circumstances.
The change in DAT classifier scores following a therapeutic intervention has led us to propose that that there is a predictable change in the score if the intervention is successful.
Figure 3e might suggest that the response could be Δ DAT classifier scores (before-after intervention) = DAT classifier score (before intervention) − 20. A response that does not follow this pattern indicates either the need for DAT or failure in therapeutic administration.
Further studies are required to establish whether this optimism is justified. The DAT classifier score described here has been modelled on the clinical decisions of a relatively small number of clinicians operating out of only a few clinics in one country. However, it is important to understand the aim is not to model the behavior of the average clinician but to model clinical behavior that results in good outcomes for PwP.