4.1. Feature Extraction Approaches
The spectrum fit approach was performed by using three different data sources: orientation, accelerations, and angular rates. The correlation indexes and significance test results support the hypothesis of data sources being interchangeable, thus potentially reducing the computational cost of an embedded classification algorithm. The estimated center of the peaks correctly fall in the proper frequency range for both Parkinsonian and essential tremors. Moreover, data collected during the execution of the resting task show a predominance of Parkinsonian tremors over ET, whereas data provided from postural and kinetic tasks highlighted the presence of essential tremor in the subject, as expected. Nevertheless, this feature extraction approach suffers from several issues due to the Gaussian curve fitting.
One of the main problems is that this technique relies on the hypothesis that a tremor spectrum can be always described by means of a “bell” shaped curve. However, this is not always possible. In some cases (see
Figure 5a), the tremor spectrum had a quasi-flat-top trend, thus the fitted Gaussian curve spanned over an interval wider than the actual range of interesting frequencies, and the curve peak was not correctly identified: although curves with a width larger than a predefined threshold were excluded from the dataset during the training phase, the risk of misclassifying that tremor is high if subjects will show tremor spectra similar to the excluded ones.
A second problem is related to the quality of the fit. In some cases, a few small peaks around the spectrum’s dominant peak induced the fit process to produce a wider and lower curve, thus underestimating the actual height of the dominant peak (see
Figure 5b). This could be avoided by reducing the fit interval to a neighborhood of the dominant (maximum) peak, the size of which could be defined a priori (for instance, 1 Hz per side). However, this trick would work only for sharp dominant peaks (it could potentially lead to the exclusion of part of a wider peak) and would be dependent on the size of the fit interval (it may still include undesired peaks). For these main reasons, the more general approach represented by the second feature extraction strategy should be preferred.
4.2. Classifier Performance
The training procedure produced several classifiers able to differentiate among Parkinsonian tremor and essential tremor on the basis of features extracted from different signals and approaches. The use of multiple sensors is usually preferred in an acquisition system, as this increases the available data: this, in turn, can help to improve the quantity and the quality of extracted information. Nevertheless, in miniaturized acquisition systems such as wearable devices, the area occupancy and the power consumption of a sensor are two important factors that may overcome the choice of having more data available. Thus, many wearable devices do not provide both the sensors used in this work (namely, accelerometer and gyroscope), or alternatively the full product comes with a high price. In order to investigate the portability of the developed algorithms and to determine their reliability when some sensors are not available, three different feature sets were built from each extraction approach: two datasets were based on raw data collected by means of accelerometer and gyroscope, respectively, while the third one contained information from both data sources. For each learning algorithm, feature selection and fine hyperparameters tuning were performed to obtain a good predictive model. Finally, the cross-validation resulted in the accuracies reported in the previous tables.
Generally speaking, classifiers based on power-related features performed better than those based on parameters of the fitting Gaussian curves, having on average greater accuracies (91.4% versus 81.6%). These results, along with previously discussed problems associated with this latter approach, lead to preferring the classification based on signal power features with respect to the one based on the spectrum fit.
Regarding datasets based on power features, the feature selection procedure revealed that the most used one is the Peak Power of task 1 (recurring in 9 out of 15 classifiers), followed by Relative Power (4 uses) and Peak Frequency of task 1 (3 uses). These results highlight the importance of the resting task (and partially the postural task) in the differentiation among Parkinsonian and essential tremors, as expected. The goal of task 1, in fact, is to stress the former tremor: in a population of only tremor-affected subjects and for a two class problem, data collected within this task are sufficient to correctly classify almost all subjects. However, the rest of the features, as well as the tasks, might become relevant if the classification problem either extends to other tremor forms, or includes healthy subjects to be classified as such. Using multiple sensors led to better overall performance as expected, with out-of-sample accuracies of models trained with the mixed dataset slightly greater than the ones obtained with data from a single sensor. Specifically, the cross-validation revealed that the worst model trained from the mixed dataset (out-of-sample accuracy: 87.5%) misclassified only 1 out of 17 PT subjects and 2 out of 7 ET subjects, whereas the best ones (out-of-sample accuracy: 95.8%) correctly classified all the ET subjects and failed to classify only 1 out of 17 PT subjects. Despite the lower performance when compared to these latter models, it is however important to underline that solutions based on data either from accelerometer only or gyroscope only still provided good results, with out-of-sample accuracies around 90%: this means that even wearable devices that do not host both these sensors might be used to achieve the results described in this work. Although best results were achieved by Support Vector Machine and Naïve Bayes classifiers, the choice of the final model should take into account not only the estimated prediction accuracy, but also intrinsic features of the specific learning algorithm. Some of these are: training time; interpretability; prediction time; and flexibility. For instance, Support Vector Machine classifiers have typically a high accuracy, but Naïve Bayes classifiers require less training time. On the other hand, both may lack interpretability, an attribute for which decision trees and k-Nearest Neighbors are the best solutions. The authors of this paper reserve the right not to choose the best classifier at the time of writing, since this final decision will depend on several other factors. Firstly, the sample needs to be enlarged with new subjects in order to get a higher number of observations, possibly equal among Parkinsonian and essential tremors. Moreover, the model selection will depend on the computational power of the system that will host it, being either the wearable device itself or the Android system running the classification app.
Finally, as far as the substudy related to differentiation between healthy and tremor-affected subjects concerns, the preliminary accuracies were lower than expected, ranging on average from 80 to 85%. Nevertheless, a good result was achieved by training a decision tree on the feature set based on angular rates, with an out-of-sample accuracy of 92.21% obtained by using two features only, specifically
and
(see
Section 2.3.3 for more details). If considering the “Patient” label as the positive class and the “Control” label as the negative class, the model’s
k-fold sensitivity and specificity were 95.83% and 85.71%, respectively: this means that only one tremor-affected subject out of 24 was misclassified as healthy, and two control cases out of 14 were misclassified as trembling. Such results are encouraging, but further investigations are needed in the near future (1) to assess whether or not new stronger features are necessary, and (2) to possibly modify the model optimization procedure (e.g., changing the metric used in the feature selection process). Moreover, parallel to the enlarging of the subjects sample size already foreseen, new healthy subjects will be recruited to increase the range of control cases.
During the last few years, several monitoring systems and approaches have been proposed by researchers to enable the differentiation between essential and Parkinsonian tremors. For instance, di Biase et al. [
27] defined a new metric called “tremor stability index” to discriminate such tremors with high diagnostic accuracy. The presented index was based on the distribution of changes in the tremor frequency over time: such information was measured by means of a triaxial accelerometer taped on the wrist of the monitored subject during the execution of the aforementioned rest and postural tasks (see
Section 2.2). The system was tested in a cohort of 36 patients affected by PT and ET, and it was validated on a second cohort comprising 55 further subjects: the diagnostic accuracy, assessed by binary logistic regression and by receiver operating characteristic analysis, was about 90%. A similar study conducted by Barrantes et al. [
28] involved the use of a smartphone’s accelerometer to measure time-frequency differences between PT an ET. The study was carried out in a cohort of 52 subjects comprising patients affected by ET and PT, healthy subjects and patients with tremor of undecided diagnosis. The smartphone was placed on the back of the hand presenting the tremor and recorded 30 s of resting and postural tasks each. The data analysis was performed in the frequency domain, and a simple classifier was trained on the extracted discriminative features. The developed system correctly classified 49 out of 52 subjects in the category with/without tremor and 27 out of 32 patients in the category PD/ET, with a discrimination accuracy of 84.38%. The study presented in this document shares some aspects with such studies. In fact:
The used measuring systems were based on an accelerometer, it being either embedded in a dedicated device or in a smartphone;
The monitored tasks were resting and postural ones;
The conducted analyses were based on the frequency features of the tremors.
Nevertheless, this work enriches the past studies with some new items. First, the measuring system not only included a triaxial accelerometer but also a triaxial gyroscope: this additional data source allowed the collection of data of a different nature, which in turn contributed to enhancing the classification accuracies, both ET VS PT and Healthy vs. Tremor-affected subjects ones, as discussed above. Moreover, the small size of the measuring device potentially allows the usage of two devices simultaneously, one mounted on the wrist and one mounted on a finger: this larger sensor network enables the collection of additional information that might be used to improve the differentiation accuracy. Finally, the addition of new tasks to be performed by the subjects increased the range of collectible tremor artifacts: this provided a wider variety of features, which in some cases were proved to be more discriminating than those based on rest and postural tasks only (as in the case of features from task 4 in Healthy/Tremor-affected Subjects differentiation). To conclude, overall results achieved by the study presented in this document are in line with the ones in literature, in some cases being even better: the investigations included in the future developments, such as sample enlargement and training refinement, will aim to strengthen the models robustness and possibly further increase accuracy.