^{1}

^{2}

^{*}

^{1}

^{3}

^{4}

^{4}

^{1}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

This paper presents the development and evaluation of a method for enabling quantitative and automatic scoring of alternating tapping performance of patients with Parkinson's disease (PD). Ten healthy elderly subjects and 95 patients in different clinical stages of PD have utilized a touch-pad handheld computer to perform alternate tapping tests in their home environments. First, a neurologist used a web-based system to visually assess impairments in four tapping dimensions (‘speed’, ‘accuracy’, ‘fatigue’ and ‘arrhythmia’) and a global tapping severity (GTS). Second, tapping signals were processed with time series analysis and statistical methods to derive 24 quantitative parameters. Third, principal component analysis was used to reduce the dimensions of these parameters and to obtain scores for the four dimensions. Finally, a logistic regression classifier was trained using a 10-fold stratified cross-validation to map the reduced parameters to the corresponding visually assessed GTS scores. Results showed that the computed scores correlated well to visually assessed scores and were significantly different across Unified Parkinson's Disease Rating Scale scores of upper limb motor performance. In addition, they had good internal consistency, had good ability to discriminate between healthy elderly and patients in different disease stages, had good sensitivity to treatment interventions and could reflect the natural disease progression over time. In conclusion, the automatic method can be useful to objectively assess the tapping performance of PD patients and can be included in telemedicine tools for remote monitoring of tapping.

The ability to perform functional upper limb motor tasks is essential for most of activities of daily living. Patients diagnosed with Parkinson's disease (PD) often have difficulties with timing control and coordination of upper limb movements [

Measuring symptoms and treatment-related complications in advanced PD is challenging. In clinical settings today, quantification of PD symptoms is usually done by employing rating scales, like the Unified Parkinson's Disease Rating Scale (UPDRS), which is mainly based on observations and judgments by clinicians. During evaluation of symptoms, both the clinician- and patient-derived outcome measures offer complementary information. The gold standard approach to evaluate the severity of upper limb motor symptoms is to use the UPDRS-part III (motor examination), more specifically items #23 (Finger Tapping), #24 (Hand Movements) and #25 (Rapid Alternating Movements of Hands) [

Therefore, there is a need for objective and observer-independent measurements which may provide better resolution than clinical scales for more accurately capturing symptom severities and fluctuations. Quantitative measurement of upper limb motor performance of PD patients during finger tapping tests has been previously analyzed by the use of musical instrument keyboard [

This paper presents the development and evaluation of a method for enabling quantitative and automatic scoring of alternating tapping performance (ATP) of PD patients, using a touch-pad handheld computer designed for telemedicine. The paper reports on different metrics to evaluate the quality of assessments of the method including correlations to visually assessed scores of ATP and UPDRS motor ratings, reliability, and sensitivity to treatment interventions and natural PD progression over time. In addition, the ability of the method to discriminate between healthy elderly subjects and patients in different disease stages is reported.

The results presented in this paper are based on data from two clinical studies, both of which were approved by the relevant agencies and written informed consent was given. In total, 95 patients in different clinical stages of PD and 10 healthy elderly (HE) subjects were assessed (

Sixty-five patients diagnosed with advanced PD were recruited in an open longitudinal 36-months study (Duodopa in Advanced Parkinson's: Health Outcomes & Net Economic Impact, EudraCT No. 2005-002654-21) at nine clinics around Sweden [

Both patients and HE subjects performed repeated and time-stamped assessments in their home environments using a telemetry test battery implemented on a touch-pad handheld computer [

Assessments with the test battery were performed four times per day during week-long test periods. In the Swedish study, the test battery was used quarterly for the first year and biannually for the second and third years. The LCIG-naïve patients used the test battery at baseline (in which they were on oral treatment), month 0 (first visit; at least 3 months after percutaneous endoscopic gastronomy surgery), and at follow-up test periods. In 23 LCIG-naïve patients, assessments with the test battery were available during oral treatment and at least one test period after having started infusion treatment. Hence, n = 23 in the LCIG-naïve group.

In the Italian study, patients used the test battery for two test periods with a washout week in between. The HE subjects used the test battery for one test period. The total number of observations with the test battery were as follows: Swedish group (n = 10,079), Italian F group (n = 822), Italian S group (n = 811), and HE (n = 299).

The development and evaluation of the method was mainly done using the Swedish dataset. To avoid onset and offset effects, data points collected during the first and last two seconds of the test time were discarded. Hence, the time series of interest were in the range between 2 s and 18 s.

A web-based system was developed to visualize the performance of patients during tapping tests and to allow users (PD specialists) to rate different tapping impairments [

The system retrieved time series of raw data from the database tables and visually depicted them into different types of graphs. Information presented included: (i) distribution of taps over the two fields; (ii) horizontal tap distance

In total, 24 quantitative parameters (

To quantify the ‘speed’ performance during tapping tests, the following parameters were calculated and used in the subsequent analysis. The total number of taps (TNT) was calculated as the total sum of taps in a test occasion for the mid 16 s. The mean tapping speed (MTS) was defined as the mean rate of change of tap distance with time, using the following Equation:

Hence depending on the side, the following parameters were calculated: mean tapping speed from left to right (MTSLR), CV of tapping speed from left to right (CVTSLR), mean tapping speed from right to left (MTSRL) and CV of tapping speed from right to left (CVTSRL). Principal Component Analysis (PCA) using correlation matrix method was applied to these parameters to reduce their dimensions and obtain a single parameter. The purpose of PCA is to take _{1}_{2}_{n}_{1}_{2}_{n}

As stated above, this dimension reflects the subject's ability to correctly tap the fields on the screen and mainly focuses on coordination deficits. To quantify the ‘accuracy’ during tapping, the following four parameters were calculated. To measure the overall precision while tapping the two fields over the test trial, the mean distance from the centers of the fields (MDCF) was calculated. For the taps that were tapped within the area of the fields, the distance was preset to zero. The second parameter measures the regularity of precision over the test trial and is defined as the CV of distances from the center fields (CVDCF). The higher the CVDCF, the higher irregularity of tapping precision is. In order to quantify the overall distribution of the taps (ODT) over the two fields, initially the variation (ratio between summed distance and total number of taps) for each field was calculated followed by a calculation of mean variation of the two fields. Finally, the overall tapping precision (OTP) was defined as the mean distance from center fields irrespective of whether the taps were inside or outside the field areas, corrected for total number of taps. After applying PCA to these four parameters, the first PC accounted for 65% of the variance in the data and was used to represent the A-ACCURACY.

The ‘fatigue’ during tapping is usually characterized by continued demotion of tapping performance relative to the passage of the test trial. The following parameters were defined to quantify the ‘fatigue’ dimension. The first parameter is the mean tapping speed per cycle (MTSPC) and was calculated as follows Equation:
_{i}_{+}_{1}_{i}

To quantitatively characterize any change in sequential irregularity (or aperiodicity) of time series between the first and second part of the signal, Approximate Entropy (ApEn) statistical measure was applied. ApEn measures the similarity between a chosen window of time series of a given duration and the next set of windows of the same duration. A time series containing a single frequency component has a relatively small ApEn value whereas more complex time series containing multiple frequency components have high ApEn values, as a result of high level of irregularity. A detailed description of ApEn method can be found elsewhere [

The other two parameters used for measuring tapping ‘fatigue’ were based on Dynamic Time Warping (DTW) method. The DTW is an algorithm used for comparing time series of different lengths and speeds by first locally stretching or compressing them and then by “warping” their time axes so that a relationship between the data points in the time series is maintained. Given two discrete time series, an input _{1}_{2}_{N}_{1}_{2}_{N}_{a}_{b}_{a}_{b}_{w}_{w}

In our work, the

The last parameter is designed to measure the overall trend of tapping reaction time over the test trial by calculating the mean correlation coefficient for jackknife (leave out one observation) samples between

‘Arrhythmia’ in tapping is characterized by a serial irregularity in tapping performance, followed by an unpredictable behavior and abrupt changing patterns. In order to quantitatively measure tapping ‘arrhythmia’, the following parameters were calculated. The first two parameters were based on the application of the ApEn method. To quantify the presence of serial irregularity in tapping speed and vertical tap distance over the test trial, ApEn (with

In order to measure variation in distance between the two fields on the screen, the shimmer measure was calculated. Initially, a zero-crossing signal _{1}_{…}_{n}_{c}_{c}_{t}_{t}_{i}_{i}

The standard deviation of _{i}_{i}_{i}

The mean and standard deviation of

A clinician, rating ‘arrhythmia’ using visualized graphs, would rate a sample as normal if he observes periodic patterns. On the other hand, he would rate a sample as extremely severe arrhythmic if he notices aperiodic patterns. The new parameter called the cross-correlation between the slopes (CCBS) quantitatively measures this aperiodicity by initially creating an artificial perfectly-periodic slope (PPS) signal using the _{i}_{avg}_{avg}

The PPS signal was then constructed using both distance and time series signals as follows:

The original slope signal (OS) was then computed from the

The last parameter to quantitatively measure ‘arrhythmia’ was based on the cross-approximate entropy (Cross-ApEn) between PPS and OS. The Cross-ApEn quantifies the regularity of patterns in a pair of time series [

In order to classify ATP based on the five GTS levels, a simple logistic regression model was used as a classifier to map the extracted quantitative parameters to the corresponding V-GTS scores. Initially, the PCA was applied to all the extracted parameters in order to reduce their dimensions, without much loss of variance in the data. An important step when applying PCA is to identify and retain the important components that account for a large proportion of the total variance. In this work, the appropriate number of “significant” components was decided by selecting a cumulative percentage of total variance for which it was desired that the selected PCs should account for more than 70% of the total variance in the original data. Applying this criterion resulted in retention of the first 5 PCs to be used as predictors in the subsequent regression analysis (

Agreements between V-GTS and A-GTS were evaluated using the area under the receiver operating characteristics curve (AUC) and weighted Kappa statistics as major performance evaluation measures. A stratified 10-fold cross-validation (also known as rotation estimation) was applied to assess the generalization ability of the logistic regression classifier to future independent data sets. Spearman's rank correlation coefficients were used for assessing linear relationships between computed and visual scores. Reliability

The agreements between V-GTS and A-GTS were very good with a Kappa coefficient of 0.87 (p < 0.001) and weighted AUC value of 0.86 (

The internal consistency among the four automated dimensions was acceptable (Cronbach's

Mean computed scores of the LCIG-naïve patients from baseline to the 36-month follow-up are shown in

In this study, we showed that quantitative and objective measures of ATP on a touch-pad test battery are valid measures of upper limb motor performance in PD. Majority of these measures had strong and significant correlations to visually assessed and clinical scores, suggesting that they contain important elements of symptom severity information in ATP. The regression-based classifier could classify the GTS of patients on a 0 (normal)–4 (extremely severe) scale comparatively well to the neurologist with a weighted Kappa coefficient of 0.87 and a weighted AUC of 0.86. The strongest correlations between computed and visual scores were seen when assessing GTS (0.91), ‘speed’ (0.89) and ‘accuracy’ (0.77). However, there were moderate and weak correlations when assessing ‘arrhythmia’ (0.57) and ‘fatigue’ (0.38), respectively. This possibly could be adjusted by adding more parameters measuring these two dimensions.

The main idea behind defining the four dimensions was to measure the severity of symptoms during tapping tasks as being represented in the items #23–#25 of the UPDRS scale [

The rationale behind including the Italian and HE datasets in the analysis was to have data from more early PD patients and healthy elderly, respectively along with the advanced PD patients from the Swedish study. Although having small sample sizes, adding these two datasets would assist in interpretation of the presented results. However, the employment of linear mixed-effects models allowed us to use all the data available, account dependencies within- and between-subjects, and model mean computed scores, with subject ID as random effects [

The PCA for the 24 parameters showed that the ATP could be explained by only 5 components (

The method also showed to be sensitive to treatment interventions. The significant improvements in mean A-SPEED, A-ACCURACY and A-GTS scores indicated that the method was able to measure motor symptom improvements with LCIG that were sustained over at least 24 months. These changes were also documented with the clinical rating scales [

A limitation of the present study is that the clinical evaluation of ATP is done by visual inspection of graphs and not by live/video observations of the patient's performance. However, even these kinds of observations may be biased as a result of the within- and between-clinician variability in ratings. In the study performed by Heldman

PD is a multidimensional and complex disorder affecting both motor and non-motor symptoms. The overall well-being and the quality of life of PD patients were shown to be highly influenced by non-motor symptoms and weakly by motor symptoms [

In summary, the method we developed for the alternate tapping test is appropriate to quantitatively and objectively assess the severity of ATP of PD patients. The clinimetric properties,

This work was performed in the framework of the PAULINA project, funded by Swedish Knowledge Foundation, Nordforce Technology AB, Stockholm, Sweden, Animech AB, Uppsala, Sweden and Dalarna University, Borlänge, Sweden.

Mevludin Memedi, Taha Khan, Dag Nyholm and Jerker Westin are shareholders in Jemardator AB, provider of a test battery for collecting symptom data.

^{®}) in patients with advanced Parkinson's disease

Illustration of the alternating tapping test using the telemetry device.

Two illustrative examples of visualized ATP in the web-based system. (

(

Mean scores of First PC for each category of items #23 (Finger Tapping), #24 (Hand Movements) and #25 (Rapid Alternating Movements of Hands) of the UPDRS scale, corrected for individual subject variation using linear-mixed effects models. Y-axis: a high score means good function. UPDRS items: 0 (Normal), 1 (Mild slowing and/or reduction in amplitude), 2 (Moderately impaired), 3 (Severely impaired) and 4 (Can barely perform the task). P-values and % changes are shown with respect to category 0. UPDRS item #23: 0 (n = 2323), 1 (n = 3649), 2 (n = 2793), 3 (n = 687), 4 (n = 25). UPDRS item #24: 0 (n = 2532), 1 (n = 3964), 2 (n = 2441), 3 (n = 491), 4 (n = 49). UPDRS item #25: 0 (n = 2201), 1 (n = 4085), 2 (n = 2511), 3 (n = 598), 4 (n = 82). Note that the category 4 of the UPDRS items had small number of observations that were assessed in very few patients.

Trends of mean computed scores of LCIG-naïve patients over the 36-months study period, corrected for individual subject variation using linear mixed-effects models. Y-axis: a high score for A-SPEED, A-ACCURACY, A-FATIGUE and A-ARRHYTHMIA means good function whereas a high score for A-GTS means severe. P-values are shown with respect to baseline (−3) test period. Test period: −3, baseline (n = 507); 0 (n = 506); 3 (n = 468); 6 (n = 389); 9 (n = 417); 12 (n = 362); 18 (n = 296); 24 (n = 322); 30 (n = 227); 36 (n = 108).

Mean scores of the four automated dimensions for each group, corrected for individual subject variation using mixed-effects models (black line: right hand; red line: left hand). Y-axis: a high score means good function. Group: healthy elderly, HE (n = 286); Italian S (n = 806); Italian F (n = 784); Swedish (n = 9,531).

Characteristics of PD patients and of healthy elderly participants, presented as median ± interquartile range.

Patients (n, gender) | 65 (43 m; 22 f) | 15 (13 m; 2 f) | 15 (13 m; 2 f) | 10 (5 m; 5 f) |

Age (years) | 65 ± 11 | 65 ± 6 | 65 ± 6 | 61 ± 7 |

Years on levodopa | 13 ± 7 | 7 ± 8.5 | 5.5 ± 6 | NA (not applicable) |

Hoehn and Yahr stage at present | 2.5 ± 1 |
2 ± 0 |
2 ± 0.5 | NA (not applicable) |

Total UPDRS | 49 ± 20.5 |
33.5 ± 11.8 |
26 ± 16.5 | NA (not applicable) |

Assessments performed in afternoons;

Assessments performed in on-state.

Percentage total variance accounted for by first 5 PCs and contributions of the 24 parameters in each one of them. Details of the parameters are discussed in the text.

TNT | ‘speed’ | 6.8 | 0.2 | 1.9 | 4.3 | 2 |

MTS | ‘speed’ | 6.9 | 3.5 | 0.3 | 1 | 0.7 |

MTSLR | ‘speed’ | 6.6 | 3.5 | 1.9 | 2 | 4.1 |

CVTSLR | ‘speed’ | 2.7 | 5.9 | 4.9 | 1.5 | 14 |

MTSRL | ‘speed’ | 6.7 | 3.1 | 1.7 | 0.3 | 3.7 |

CVTSRL | ‘speed’ | 2.5 | 7.7 | 2.2 | 7.4 | 3 |

MDCF | ‘accuracy’ | 3.3 | 6.2 | 6.8 | 8 | 1.8 |

CVDCF | ‘accuracy’ | 1.4 | 5.8 | 8.8 | 9.7 | 0.2 |

ODT | ‘accuracy’ | 5.5 | 5.5 | 0.5 | 2.3 | 0.2 |

OTP | ‘accuracy’ | 6.1 | 4.5 | 2 | 1.1 | 1.2 |

MTSPC | ‘fatigue’ | 6.9 | 1.5 | 0.7 | 3.2 | 2.2 |

DDT12 | ‘fatigue’ | 4 | 6.1 | 9.5 | 1.5 | 6.2 |

DMTSPC12 | ‘fatigue’ | 2.8 | 8.1 | 1.4 | 5.3 | 0.6 |

DAEDT12 | ‘fatigue’ | 1.6 | 0.8 | 8 | 7 | 12.4 |

DTWMTS12 | ‘fatigue’ | 4.4 | 7.2 | 0.3 | 7.9 | 1.6 |

DTWDT12 | ‘fatigue’ | 3 | 5.5 | 10.6 | 4.9 | 10.1 |

MCDTT | ‘fatigue’ | 0.4 | 0.3 | 12.6 | 4.7 | 4.6 |

AEMTS | ‘arrhythmia’ | 4.9 | 1.2 | 0.3 | 10.1 | 3.8 |

AEY | ‘arrhythmia’ | 5.3 | 3.3 | 2.8 | 4.7 | 3.6 |

SDSHIM | ‘arrhythmia’ | 4.6 | 3.2 | 2.3 | 1.3 | 6.6 |

MJVIS | ‘arrhythmia’ | 1.6 | 0.3 | 11.4 | 2.2 | 2.4 |

SDVJS | ‘arrhythmia’ | 4.2 | 4.9 | 4.3 | 2.1 | 7.3 |

CCBS | ‘arrhythmia’ | 6 | 4.7 | 1.6 | 2.4 | 0.7 |

CABS | ‘arrhythmia’ | 1.8 | 6.9 | 3.3 | 5 | 6.8 |

Total variance | NA (not applicable) | 39 | 16 | 7 | 6 | 4 |

Assessments of GTS for the computer method and human rater. The computed scores are derived after applying 10-fold cross validation on the logistic regression classifier.

| |||||||
---|---|---|---|---|---|---|---|

14 | 6 | 0 | 0 | 0 | 20 | ||

7 | 10 | 4 | 0 | 0 | 21 | ||

2 | 5 | 7 | 6 | 0 | 20 | ||

0 | 0 | 8 | 8 | 3 | 19 | ||

0 | 0 | 0 | 3 | 14 | 17 | ||

23 | 21 | 19 | 17 | 17 | 97 | ||

0.93 | 0.82 | 0.74 | 0.85 | 0.95 | 0.86 |

Absolute Spearman rank correlations between computed and visual scores.

A-GTS | 1 | 0.69 | 0.58 | 0.56 | 0.72 | 0.92 |

V-GTS | 0.91 | 0.89 | 0.55 | 0.62 | 0.65 | 0.88 |

V-SPEED | 0.89 | 0.89 | 0.65 | 0.81 | 0.80 | 0.91 |

V-ACCURACY | 0.59 | 0.39 | 0.77 | 0.32 | 0.54 | 0.55 |

V-FATIGUE | 0.57 | 0.49 | 0.53 | 0.38 | 0.54 | 0.58 |

V-ARRHYTHMIA | 0.63 | 0.34 | 0.64 | 0.54 | 0.57 | 0.55 |