1. Introduction
Parkinson’s disease (PD) is a frequent neurodegenerative disorder that is associated with a substantial reduction of dopaminergic neurons especially in substancia nigra pars compacta [
1]. The primary motor symptoms of PD comprise tremor at rest, muscular rigidity, bradykinesia, and postural instability [
1]. Patients with PD also develop a variety of non-motor symptoms [
2] such as sleep disturbances, depression, cognitive impairment, etc. To diagnose, rate and monitor motor and non-motor symptoms of PD, various clinical rating scales such as Unified Parkinson’s Disease Rating Scale (UPDRS) [
3], Freezing Of Gait Questionnaire (FOG-Q) [
4], or Addenbrooke’s Cognitive Examination-Revised (ACE – R) [
5] have been developed. Nevertheless, reliability of the assessment is often reduced by inter-rater variability [
6].
Up to 90% [
7] of patients with PD develop a multi-dimensional speech disorder named hypokinetic dysarthria (HD) [
8], which is manifested in phonation, articulation, and prosody [
9,
10,
11]. In the area of phonation, insufficient breath support, reduction in phonation time, increased acoustic noise, instability of articulatory organs, microperturbations of frequency/amplitude, and harsh breathy voice quality has been observed [
9,
12]. HD leads to serious complications in daily communication of patients with PD [
13]. Generally, HD was found to be more severe in the advanced stages of PD [
14].
As reported by the recent studies, acoustic analysis of HD can provide clinicians with non-invasive and reliable methodology of PD diagnosis, assessment and monitoring [
9,
15]. Moreover, this methodology has also been used to monitor the efficiency of PD treatment [
10,
16,
17,
18]. In the field of acoustic analysis of PD phonation, the authors mostly focused on the sustained vowel /a/ [
9]. Conventional phonatory features such as jitter, shimmer, harmonic-to-noise ratio, degree of unvoiced segments, and formant-based parameters extracted from this vowel have been widely used to diagnose PD [
12,
19,
20,
21,
22,
23]. Although Hazan et al. [
24] employed analysis of sustained phonation for diagnosis of PD even in its early stage, based on the recent review [
9], most of the researchers find relevant applications of the phonatory analysis especially in moderate or severe stages of this disorder.
For example, the analysis of sustained phonation has been utilized during PD severity assessment. In 2010, Tsanas et al. [
15] enrolled 42 PD patients and parameterized their sustained phonation of vowel /a/ by a set of conventional features that were consequently mapped to UPDRS, part III (motor examination) and the total score of this scale. Using classification and regression trees, they estimated the UPDRS III score with MAE (mean absolute error) equal to 5.95. The total UPDRS score was estimated with
. A parametric version of this dataset has been made available for research purposes and other research teams further decreased the estimation error [
25,
26,
27]. Another work that deals with the automatic clinical scores estimation was published by Mekyska et al. [
21]. In this study, they acquired sustained phonation of vowels /a/, /e/, /i/, /o/, /u/ in 84 PD patients. Modeling conventional and advanced features by random forests provided the estimation of UPDRS III with
. In addition, the authors estimated several other clinical scores such as UPDRS, part IV (complications of therapy) with
or Beck depression inventory (BDI) with
.
Even though HD is one of the most problematic aspects of PD, the number of longitudinal studies investigating the evolution of HD in PD over time (based on the acoustic analysis) is very limited [
28,
29,
30,
31]. If we focus specifically on longitudinal monitoring of sustained phonation, then, in fact, we can identify only one study, which is published by Skodda et al. [
31]. In this work, the authors repeatedly (with average time interval 32.50 months) acquired sustained vowel /a/ in 32 female and 48 male PD patients (age in session 1: 66.28 ± 8.11 years; PD duration in session 1: 6.10 ± 4.63 years; UPDRS III in session 1: 20.16 ± 10.96; UPDRS III in session 2: 19.58 ± 8.29). The voice was quantified by jitter, shimmer, noise-to-harmonic ratio, and mean fundamental frequency. Based on the paired
t-test, the authors identified significant changes in shimmer and noise-to-harmonic ratio. In both cases, the values of these parameters increased. Another interesting finding is that, although some phonatory features significantly changed, UPDRS III was held widely stable over time. The authors provide two possible explanations: (1) voice impairment could be the result of an escalation of axial dysfunction too subtle to be mirrored by UPDRS III; (2) alterations of speech parameters could be completely independent of motor performance that may be based upon non-dopaminergic mechanisms. Inconsistencies in terms of the L-dopa effect on HD are further discussed in Brabenec et al. [
9].
To sum it up, although the scientific community frequently addresses phonation in association with HD (especially when diagnosing or assessing PD), to the best of our knowledge, there is only one study that focuses on HD phonatory disorders from a longitudinal perspective. Moreover, the work deals with the analysis of phonation just partially, it considers only the sustained vowel /a/, and it does not explore a possibility of PD progress prediction based on a combination of acoustic analysis and machine learning. Therefore, in the frame of our two-year follow-up study, we are going much further with the following aims:
to identify phonatory acoustic features at baseline that are significantly correlated with changes in various clinical rating scales,
to investigate relationship between changes in the phonatory acoustic features and the clinical rating scales after the two-year follow-up,
to establish mathematical models that will estimate the change in clinical rating scales based on the change in acoustic measures,
to compare results based on five vowels: /a/, /e/, /i/, /o/, /u/.
The rest of this article is organized as follows:
Section 2 describes a dataset of PD patients as well as methodology in terms of acoustic analysis, statistical analysis and machine learning. Results are reported in
Section 3 and consequently discussed in
Section 5. Finally, conclusions are given in
Section 4.
3. Results
The values of 16 acoustic features extracted from both sessions, as well as values of their differences (session 2 − session 1), are reported in
Table 2. Based on the Wilcoxon signed-rank test, we can observe that none of the features extracted from vowel /a/ significantly changed after two years. Regarding vowel /e/, we identified significantly increased microperturbations in intensity of voice and also increased aperiodicity. The same significant changes were identified in vowel /i/ and /u/. In the case of vowel /u/, in addition, we monitored the increase of microperturbations in frequency of voice. The repeated acquisition of vowel /o/ was associated with increased aperiodicity and more dominant microperturbations in frequency of voice.
The results of Spearman’s partial correlation between the baseline acoustic features (session 1) and change in clinical data (
) can be seen in
Table 3. None of the features significantly correlated with UPDRS, part III. On the other hand, in the case of part IV, we can observe negative correlation with aperiodicity (FLUF, vowels /e/, /i/, /o/, /u/), i.e., low aperiodicity at the baseline resulted in increased complications with therapy. Similarly, we identified negative correlation with tremor of jaw (F2 (CV), vowel /a/), but positive correlation with the tremor of lips (F3 (CV), vowel /o/). Another positive correlations were observed with median of energy ratio (vowels /o/, /u/), irregular pitch fluctuations (F0 (CV), vowel /a/), and variability of voice quality (GNE (SD), vowel /a/). Change in UPDRS IV negatively correlated with irregular amplitude fluctuations (TEO (CV), vowel /u/), acoustic noise (NNE (Q2), vowel /u/) and its variation (NNE (SD), vowel /a/). Results linked with the acoustic noise quantified by the median GNE are not consistent.
RBDSQ significantly and positively correlated with microperturbations in frequency of voice (PPQ, vowel /u/) and microperturbations of its intensity (APQ, vowel /a/), i.e., increased microperturbations in frequency/amplitude at the baseline resulted in deterioration of sleep. In addition, RBDSQ negatively correlated with the variation of voice quality (HNR (SD), vowel /o/).
Regarding gait difficulties, as assessed by FOG-Q, we can observe two positive correlations with tremor of jaw (F1 (CV), vowel /i/) and irregular pitch fluctuations (F0 (CV), vowel /a/). The total score of this questionnaire negatively correlates with variation of acoustic noise (NNE (SD), vowel /o/).
The results of Spearman’s partial correlation between the change of baseline acoustic features (
) and the change in clinical data (
) can be seen in
Table 4. Regarding the change of UPDRS III, it negatively correlated with the change of microperturbations in frequency of voice (PPQ, vowel /i/), aperiodicity (FLUF, vowels /e/, /o/), tremor of tongue (F1 (CV), vowels /a/, /u/), tremor of jaw (F2 (CV), vowel /e/), irregular pitch fluctuations (F0 (CV), vowels /a/, /u/), and variation of acoustic noise (NNE (SD), vowel /i/). Significant positive correlations were identified with the change of lips tremor (F3 (CV), vowel /a/), acoustic noise (ER (Q2), vowel /a/), and variation of voice quality (GNE (SD), vowel /e/).
In the case of UPDRS IV, we identified seven significant positive correlations with the change of microperturbations in frequency of voice (PPQ, vowel /e/), tremor of jaw (F2 (CV), vowel /a/), irregular amplitude fluctuations (TEO (CV), vowels /a/, /u/), and acoustic noise (NNE (Q2), vowels /o/, /u/). The change in UPDRS IV significantly negatively correlated with the change of acoustic noise (ER (Q2), vowel /u/), and its variation (ER (CV), vowel /e/).
Changes in RBDSQ significantly negatively correlated with the change of microperturbations in frequency of voice (PPQ, vowel /u/), microperturbations of its intensity (APQ, vowels /e/, /i/, /u/), tremor of lips (F3 (CV), vowel /o/), acoustic noise (NNE (Q2), vowel /e/), and its variation (ER (CV), vowel /e/). Positive correlations were identified with the change in voice quality (HNR (Q2), all vowels) and its variability (HNR (SD), vowels /e/, /o/, /u/). The similar results can be observed when assessing the quality by GNE (vowel /e/).
Finally, in terms of changes in FOG-Q, we identified significant negative correlations with the change in aperiodicity (FLUF, vowels /a/, /e/, /o/, /u/), tremor of jaw (F1 (CV), vowel /i/), tremor of tongue (F2 (CV), vowel /u/), and variation of acoustic noise (ER (CV), vowels /e/, /i/). One significant positive correlation can be observed with the change in acoustic noise variation (NNE (SD), vowel /o/). The results based on irregular amplitude fluctuations (TEO (CV)) are not consistent.
The results of the clinical scales’ estimation are reported in
Table 5. Using the acoustic analysis of sustained phonation of the baseline vowel /e/ in combination with mathematically modeling based on the XGBoost algorithm, we estimated the change in UPDRS III score with 25.7% error (
,
). The change in UPDRS IV was estimated with the lowest error equal to 11.3% (
,
) when employing acoustic analysis of the baseline vowel /o/. The change in RBDSQ was estimated with 16.3% error (
,
) based on phonatory analysis of vowel /i/. Finally, the lowest error of FOG-Q change estimation is 13.2% (
,
). In this case, the acoustic analysis of vowel /u/ outperformed the other ones.
Due to inter-rater variability as well as intra-rater variability [
52,
53,
54], consistent scoring of PD using the commonly used clinical rating scales is not an easy task. Automatic scoring, i.e., the estimation of the values of the clinical rating scales must be viewed as a tool that can provide clinicians with an additional, unbiased, and objective information that can help them with their decision-making, not as a tool that will substitute the work of clinicians. With this in mind, the predictions made by the trained XGBoost models can be considered rather reasonable as the error of 10–20% is comparable with a deviation caused by inter/intra-rater variability. Moreover, each clinical rating scale is different. On one hand, there are complex scales such as UPDRS III describing various motor aspects of PD, and, on the other hand, there are scales specifically focusing on a subset of its symptoms, e.g., FOG-Q (gait difficulties), RBDSQ (sleep disorders), etc. This information must be taken into account when evaluating the prediction errors because, the more complex the scale is, the more difficult it becomes to predict its values. This can be seen in our results as well. The most complex of the scales was predicted with the largest prediction error.
Feature importances of the SGBoost models are visualized in
Figure 2. The figure shows the feature importances for all of the trained models. Feature importances quantify a relative importance of the features in the ensemble of the trained XGBoost model [
47]. Therefore, the higher the value of the feature importance, the more important the feature is for the prediction of the dependent variable. With this in mind, the rationale behind this visualization is to show which features are important, and how strong that importance is, for the trained models in direction of predicting the change in the particular clinical rating scales in the horizon of two years given the acoustic features at the baseline.
Based on these graphs, we can conclude that the estimation of UPDRS III change requires a complex parametrization because, in all scenarios, at least 13 acoustic features were employed. In this case, especially median NNE was not frequently used. Although the models estimate the change of UPDRS IV with the lowest error, they usually use just a few phonatory parameters. In fact, in the case of vowel /o/, we observed 11.3% estimation error based on the following three phonatory features: GNE (Q2), ER (Q2), and FLUF. Generally, these features quantify quality of voicing. The best estimation of the RBDSQ change is based on eight phonatory parameters extracted from vowel /i/. The most important features quantify tremor of jaw (F1 (CV)), aperiodicity (FLUF), and microperturbations in intensity (APQ). Finally, based on the feature importances, we can observe that the most important role in FOG-Q change estimation was played by formant frequencies quantifying tremor of the articulatory organs.
4. Discussion
Although the only existing longitudinal study [
31] is different in the interval between sessions (32.5 vs. 24.0 months), we are going to compare our findings with the results reported by these authors. In contrary to Skodda et al., who observed significant change in shimmer of the sustained phonation of vowel /a/, we have not identified any significant differences in this vowel. Nevertheless, we identified significant changes in the same feature extracted from vowels /e/, /i/, /u/. In addition, we monitored some significant changes in jitter and FLUF. Based on these results, we can conclude that, for two years, patients’ voices became more aperiodic with increased microperturbations of frequency and amplitude.
None of the acoustic features at baseline significantly correlated with a change in UPDRS III, which supports the results of the clinical scales’ estimation where the lowest estimation error was above 25%. However, we identified some significant correlations between changes of phonatory features and the clinical scale. Surprisingly, except tremor of lips (F3 (CV)), acoustic noise (ER (Q2)), and variation of voice quality (GNE (SD)), worsening in UPDRS III (motor performance) was associated with improvement in phonatory characteristics. This could be explained by the fact that HD belongs to axial symptoms [
9,
31] that do not play significant part in UPDRS III. In other words, although several significant correlations were identified, we hypothesize that some underlying pathophysiological mechanism are involved and a direct interpretation is not possible.
Regarding the change in complications of therapy (as assessed by UPDRS IV), although the most significant correlations were observed with baseline features extracted from the vowel /a/, the lowest estimation error (11%) was based on vowel /o/. In this case, low aperiodicity, but increased lips tremor and increased acoustic noise at baseline, was associated with increased complications in the follow-up examination.
Only three significant correlations are reported between baseline acoustic parameters (quantifying microperturbations of frequency/amplitude and variation in voice quality) and change in RBDSQ. Although we have not identified any significant correlations based on vowel /i/, the XGBoost algorithm reached the lowest error (16%) including features calculated from this vowel. This result could originate from the ability of XGBoost to model complex interdependencies that are not evident at first sight [
47]. Regarding the partial correlations between changes in RBDSQ and phonatory features, we can conclude that mainly changes in voice aperiodicity and voice quality are linked with changes in sleep disorders.
HD and freezing of gait (FOG) are both axial symptoms of PD [
55]. In our recent work, we have found out that these symptoms share some pathophysiological mechanism [
56]. More specifically, we proved that FOG is mainly linked with improper articulation, disturbed speech rate and with intelligibility. We did not identify any significant relations between FOG and phonatory features. On the other hand, we analyzed only the sustained vowel /a/ and partial correlations were calculated only with some baseline FOG-Q sub-scores. The current study provides deeper and more complex results in terms of FOG and phonatory features relations. The first correlation analysis (baseline features vs.
FOG-Q) identified just a few significant correlations. However, based on mainly formant frequencies extracted from vowel /u/, the XGBoost model estimated the change in FOG-Q with 13% error. Generally, the significant impact of formants in this specific task is in line with our previous study [
56]. The second correlation analysis (
of the baseline features vs.
FOG-Q) revealed some relations between changes in FOG and changes in aperiodicity, tremor of jaw/tongue, and acoustic noise.
Although most of the studies dealing with the acoustic analysis of phonation in PD patients focus on sustained vowel /a/, it is not sufficiently explained why this corner vowel is more important than the other two, i.e., /i/ or /u/. Looking at the Hellwag (vowel) triangle [
34], we can see that, during phonation of vowel /a/, the tongue is in its lowest position from a vertical point of view, and in its central position from a horizontal one. In other words, a speaker does not have to make an effort to keep the tongue in a limit position (the tongue is almost relaxed). Therefore, some phonatory disorders could not be accented. This limitation is not present in vowels /i/ or /o/, where the speaker has to exert a force in both directions. On the other hand, the lowest limit position of jaw is reached during the phonation of vowel /a/. In summary, although some research teams employed a more complex set of vowels in their experiments [
19,
20,
21,
23,
24,
35,
36,
37], the vowel /a/ is still the most frequently used one. However, this choice should be supported by a complex, robust, and multilingual study (theoretically, the effect of culture and language plays no role here, but this should be proven as well). Based on these assumptions, we have decided to explore significance of all five Czech vowels. In addition, the results suggest that the progress of PD is reflected in each vowel differently. Moreover, each vowel differently correlates with changes in scores of clinical scales. Finally, in our case, the best prediction of the change in the clinical rating scales under the focus have never been based on phonatory parameters of the vowel /a/. If we have to choose one optimal candidate for considered clinical scores changes prediction (see
Table 5), it would be the corner vowel /i/, where the tongue is in limit position in both directions.
In our previous works, we proved that HD shares some pathophysiological mechanisms with other motor/non-motor features of PD. For instance, based on a combination of acoustic analysis and machine learning approaches, it is possible to predict cognitive deficits or gait disorders [
44,
56]. Although in the frame of this research we explored only the field of phonation, our results confirm the ability of acoustic HD analysis to predict the progress of PD. These findings and conclusions could have practical applications in eHealth, mHealth and generally Health 4.0 systems that could be used to remotely monitor and assess motor/non-motor deficits in PD patients.