Vocal Feature Changes for Monitoring Parkinson’s Disease Progression—A Systematic Review

Wright, Helen; Aharonson, Vered

doi:10.3390/brainsci15030320

Open AccessReview

Vocal Feature Changes for Monitoring Parkinson’s Disease Progression—A Systematic Review

by

Helen Wright

^1,*

and

Vered Aharonson

^1,2

¹

School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg 2050, South Africa

²

Department of Basic and Clinical Sciences, Medical School University of Nicosia, CY-1700 Nicosia, Cyprus

^*

Author to whom correspondence should be addressed.

Brain Sci. 2025, 15(3), 320; https://doi.org/10.3390/brainsci15030320

Submission received: 3 March 2025 / Revised: 14 March 2025 / Accepted: 17 March 2025 / Published: 19 March 2025

(This article belongs to the Special Issue New Approaches in the Exploration of Parkinson’s Disease)

Download

Browse Figures

Versions Notes

Abstract

Background: Parkinson’s disease has a significant impact on vocal characteristics and speech patterns, making them potential biomarkers for monitoring disease progression. To effectively utilise these biomarkers, it is essential to understand how they evolve over time as this degenerative disease progresses. Objectives: This review aims to identify the most used vocal features in Parkinson’s disease monitoring and to track the temporal changes observed in each feature. Methods: An online database search was conducted to identify studies on voice and speech changes associated with Parkinson’s disease progression. The analysis examined the features and their temporal changes to identify potential feature classes and trends. Results: Eighteen features were identified and categorised into three main aspects of speech: articulation, phonation and prosody. While twelve of these features exhibited measurable variations in Parkinsonian voices compared to those of healthy individuals, insights into long-term changes were limited. Conclusions: Vocal features can effectively discriminate Parkinsonian voices and may be used to monitor changes through disease progression. These changes remain underexplored and necessitate more evidence from long-term studies. The additional evidence could provide clinical insights into the disease and enhance the effectiveness of automated voice-based monitoring.

Keywords:

Parkinson’s disease; vocal features; progression; monitoring

1. Introduction

Parkinson’s disease (PD) is a complex neurodegenerative disease characterised by the progressive loss of dopaminergic neurons in the substantia nigra pars compacta. The course of the disease may be influenced by many factors, including the development of Lewy bodies, genetic and environmental aspects and endocrine abnormalities [1]. These factors may also lead to the symptoms that are observed as the disease progresses.

A worsening of symptoms marks the progression of Parkinson’s disease over time. Clinical rating scales are used by healthcare professionals to capture the symptoms’ temporal patterns and to assess the disease progression. Historically, these scales focused on motor functions and physical disability but have been expanded to include non-motor aspects, such as cognitive functions [2,3,4,5,6]. This offers a more comprehensive understanding of the long-term effects of the disease and its underlying pathology [7]. Since the rating scales predominantly rely on observations by healthcare professionals, they are inherently subjective. The training, skills and experience of the examining person may bias the rating and potentially introduce inter- and intra-rater variability [8,9]. These assessments must be conducted in person, posing challenges for patients in remote locations or those in advanced stages of the disease. The rising prevalence of PD and the high healthcare costs further complicate diagnosis, monitoring and management. Consequently, there is a growing effort to find alternative biomarkers that can be tested frequently and remotely to accurately assess PD status [10].

A promising solution is the use of a patient’s voice and speech patterns as biomarkers for disease progression. Speech can be acquired easily and conveniently using smartphones or home devices, enabling remote assessments, anytime and anywhere. This digital acquisition improves accessibility and enables digital analysis in a standard, objective manner. Patient surveys and clinical studies have documented the adverse effects of PD on speech, linking them to dysfunction in the vocal tract muscles. Common characteristics of Parkinsonian speech include a hoarse and breathy voice, monotone delivery, unchanging volume, lack of emotive expression and alterations in intonation and rhythm [11,12,13]. Speech production integrates various physiological subsystems, suggesting that changes may occur early in the disease, even before the onset of motor symptoms [14]. These attributes make speech a good candidate for monitoring PD progression.

Automated speech analysis for PD diagnosis and monitoring is supported by research on speech and voice patterns in conditions such as fatigue and emotional and mental states and in diseases such as laryngeal pathologies and COVID-19 [15,16,17,18,19,20,21]. Leveraging these insights, recent studies employed similar features extracted from PD patients’ recorded speech and utilised machine learning classifiers to estimate disease severity as reflected in the Unified Parkinson’s Disease Rating Scale (UPDRS) score or a Hoehn and Yahr (H&Y) disease stage rating. These features are well-documented voice quality metrics, such as jitter, shimmer, spectral and cepstral features [22,23,24]. Machine learning models, such as support vector machines, regression algorithms and random forests, have shown high accuracy in the diagnosis of PD [25,26]. However, their applicability to clinical settings requires a detailed description of how vocal features change as PD progresses. The existing literature either compares vocal features of Parkinson’s disease patients to healthy controls to identify differences or assesses long-term deterioration by comparing features at different stages of the disease to identify trends.

This paper provides a comprehensive review of vocal features used for longitudinal voice-based assessments of PD severity. It aims to provide an in-depth analysis of how these features change with PD progression. By synthesising existing research, this review seeks to provide valuable insights for biomedical researchers to enhance the accuracy and explainability of automated PD monitoring systems.

This paper is organised as follows: Section 2 provides a summary of related work, providing context for this paper and highlighting the gaps; Section 3 outlines the methods of this review; the results are presented in Section 4; and a discussion of the results and suggestions for future work are presented in Section 5.

2. Related Work

Increased interest in using voice as a measure of PD severity provided a growing number of studies and reviews on vocal features. Researchers have examined the effectiveness of various vocal features, feature extraction techniques and pattern recognition methods. The following summary provides an overview of these studies.

Moro-Velazquez et al. [27] reviewed studies investigating the relationship between articulatory and phonatory features of speech and UPDRS scores. They concluded that both feature sets effectively estimate PD severity. Mel-frequency cepstral coefficients (MFCCs), perceptual linear prediction (PLP) coefficients and vowel articulation indicators were used for machine learning algorithms to predict UPDRS scores. Articulatory features (e.g., voice onset time, voice breaks) yielded higher accuracy (80–95%) than phonatory features (e.g., jitter, shimmer), which achieved 75–90% accuracy. The review also noted the importance of methodological control for demographic factors, medication effects and the use of appropriate cross-validation strategies.

Ngo et al. [28] conducted a systematic review of speech and voice analysis for detecting or monitoring PD. The review aimed to determine which speech tasks and acoustic features best reflected disease severity. They found that sustained vowel phonation features were more effective for distinguishing PD from healthy controls, while speaking or reading tasks were better suited for assessing severity or monitoring treatment. Common features included MFCCs, jitter, shimmer, harmonics-to-noise ratio (HNR), fundamental frequency, speech rate and articulatory acoustic vowel space. Support vector machines, random forests, neural networks and deep learning models provided accurate severity estimates, often exceeding 90%. However, results may vary across languages, and missing data points pose challenges for longitudinal studies.

Amato et al. [29] reviewed the application of state-of-the-art analysis methodologies to Parkinsonian speech. The authors concurred with Ngo et al. [28] on the most used acoustic features. MFCCs were found to be particularly effective in the diagnosis of PD and in the estimation of UPDRS scores. The authors emphasised that an inclusion of non-speech factors, such as gender, age, language and time since diagnosis, in the feature sets can enhance model performances.

Reviews by Jones [30] and Moro-Velazquez and Dehak [31] focused on the prosodic aspects of speech. Jones [30] identified the perceptual features of PD as monopitch, monoloudness, reduced intensity and abnormal speaking rate. Acoustic studies support these findings, indicating decreased variability in fundamental frequency and intensity. Moro-Velazquez and Dehak [31] found that PD patients struggle to modify their speech rate and exhibit unorthodox breath pauses while speaking. They also demonstrated that prosodic features alone can support PD diagnosis and monitoring, but combining them with phonatory and articulatory features improves results.

The abovementioned reviews summarise the current research on Parkinsonian voice analysis. However, a significant gap remains in understanding how individual and groups of vocal features change due to PD. Features are identified, but insights into their longitudinal alterations are lacking. This absence of comprehensive acoustic models is documented [27,29] and hinders the development of precise tools for the diagnosis and monitoring of PD. Therefore, a thorough examination of studies analysing Parkinsonian voices is needed to identify those acoustic features and their temporal changes. This study addresses this gap by asking the following: What are the documented changes in vocal features that are associated with PD diagnosis and long-term monitoring?

3. Methods

This review used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework to ensure rigour, reliability and reproducibility. The search was conducted in four widely used electronic databases: Scopus, PubMed, Web of Science and IEEE Xplore. The search was limited to English-language articles and conference proceedings published from inception to 31 November 2024.

Relevant works were identified using the following keyword string: [(Parkinson’s disease) AND (monitoring OR estimation OR progression) AND (voice OR speech OR vocal) AND (features OR analysis) AND changes]. To ensure comprehensive capture, synonyms, such as “variations” and “characteristics”, were included. This search string targeted studies on Parkinson’s disease that focus on monitoring, severity estimation or long-term progression of the condition, specifically in relation to voice, speech or vocal aspects. The inclusion of “features”, “analysis” or “characteristics” ensured the capture of studies examining specific attributes of speech or voice changes due to Parkinson’s disease. The search was performed in the title, abstract and keyword fields to maximise the results and recall. The complete search and screening process is illustrated in Figure 1 below.

3.1. Study Selection

After completing the initial article search, the titles and abstracts of all retrieved records were screened by the authors manually. No automation tools were used. Full-text articles were assessed for eligibility based on the predefined inclusion criteria listed below:

English journal articles and conference proceedings analysing voice and speech changes due to Parkinson’s disease.
Studies describing and assessing measurable changes to the acoustic, phonatory and prosodic characteristics of speech due to PD (as compared to healthy controls).
Longitudinal studies assessing changes to the acoustic, phonatory and prosodic characteristics of speech due to PD over time.

For the purposes of this review, the following article topics were excluded:

Studies on PD diagnosis or longitudinal monitoring using other symptoms, such as gait disorders and REM sleep disturbances.
Studies analysing the effects of medications, therapies or surgeries.
Review articles on techniques used for PD voice assessment.
Studies measuring PD progression relative to other rating scales, e.g., the PD composite scale.
Studies focusing on identifying and analysing cognitive and neurological changes brought on by PD.
Studies relying solely on self-assessment or perceptual changes to speech and voice.

3.2. Data Extraction

The following information was extracted from the selected publications: the primary study aim, type of study (diagnosis or monitoring), data used, the features extracted from the dataset recordings, analysis methods used, noted variations in vocal features and outcomes and results reported. This information was then summarised to identify trends and notable findings, as presented in Section 4 and Section 5.

3.3. Search Limitations

Every effort was made to be thorough and inclusive in the search for studies on this niche topic. However, the following limitations were identified:

Exclusion of other rating scales: While UPDRS and H&Y were chosen for their prevalence in both clinical and research applications, excluding other scales may have limited the findings and the analysis results.
Language limitation: Including only English-language articles may have resulted in the omission of some findings.
Exclusion of therapeutic and perceptual assessments: Excluding studies explicitly investigating therapies, perceptual and self-assessments and neurological assessments may have excluded certain results and findings.

4. Results

Speech and voice analysis uses a diverse array of features, each offering insights into different pathological changes and having different potential applications. These applications include the PD diagnosis, severity monitoring by estimating a rating scale score and treatment efficacy. In the context of speech production and acoustics, vocal features are commonly grouped into three categories: phonation, articulation and prosody [32,33]. This framework is maintained in our discussion. Table 1 summarises the predominant vocal features in PD monitoring studies and their changes. A detailed examination of these findings follows in subsequent sections.

4.1. Phonatory Features

Phonatory features of speech provide information on the functioning of the physiological structures used for breathing and phonation. These include the diaphragm, muscles of the larynx and vocal folds. They are extracted from sustained vowel sounds and can provide insight into pathological changes in the larynx.

4.1.1. Jitter

Jitter is an acoustic feature that measures the variations in the period of consecutive glottal pulses. It may be calculated and represented in several different ways, including absolute jitter, relative percentage jitter and average values calculated using three or more adjacent periods. Due to its ease of evaluation, jitter is one of the most common ones assessed in voice studies and one of the first to be considered for pathological analysis. It has been featured in many classification studies in which features are compared between healthy and diseased voices, and findings from these suggest that the jitter value from a pathological voice would be higher than that from a healthy voice [34,35,49]. However, the exact increase is not quantified. There is also a lack of extended longitudinal studies, which evaluate the changes in jitter values over longer periods of time; thus, further deterioration cannot be quantified.

4.1.2. Shimmer

Shimmer, often studied and reported alongside jitter, is another measure derived from the glottal pulses. It quantifies the difference in amplitudes between consecutive pulses. As with jitter, there are several different ways to calculate and represent it. It has also featured heavily in classification studies, with findings suggesting higher values in pathological voices than in healthy voices [35,48,52]. However, the exact increase has not been quantified, and no long-term longitudinal studies exist to provide details on how shimmer may be affected over the total course of the disease.

4.1.3. Harmonics-to-Noise Ratio

The harmonics-to-noise ratio (HNR) quantifies the amount of noise present in a signal. This may be due to incomplete closure of the vocal cords, which allows for excess air to move through [35]. Since the HNR is a ratio of signal to noise, higher values indicate better signal quality with lower noise content, while lower values mean the opposite. The studies reviewed found that HNR values were lower in PD-affected voices than in healthy voices [49], although longitudinal studies and long-term results are not available. The inverse of the HNR, namely the noise-to-harmonics ratio, is also sometimes reported, showing correspondingly opposite results [34,48,53].

4.1.4. Glottal-to-Noise Excitation Ratio

This feature assesses the relative contributions of vocal fold oscillations and noise to a total voice signal. It is often used in machine learning classifiers to distinguish between PD patients and healthy controls, although the details of the changes in this feature value are not specified. It has not been featured in any longitudinal studies to date [55,56,57].

4.1.5. Correlation Dimension (D2)

Originating in chaotic signal analysis theory, this feature describes the complexity and uncertainty present in vocal sounds. Some studies have identified that an increased D2 value is related to an increase in signal irregularity, and therefore, these values would be higher in pathological voices than in healthy ones [58]. However, it has been noted that D2 is susceptible to spurious noise, such as from recording devices, and, therefore, may not be completely reliable for the task of voice analysis [61].

4.1.6. Pitch Period Entropy

In signal analysis, entropy quantifies signal uncertainty. In speech analysis, these features quantify their inherent complexity. Specifically, pitch period entropy assesses the variability in pitch periods. It has been used in many classification studies, suggesting it is well suited to capturing the changing dynamics of vocal signals [55]. Evidence from two studies shows that the pitch period entropy value in pathological voices may be higher than that of healthy voices [60,61]. However, the long-term changes with worsening PD are unknown.

4.1.7. Recurrent Period Density Entropy

This feature measures signal complexity and uncertainty. It measures how often specified patterns appear in a signal, making it useful for quantifying dysphonia in pathological voices [89,90]. It is often selected for use in diagnostic classifiers, but the values of this feature for healthy and pathological voices have not been quantified. A disadvantage of this feature is that it is complex to interpret and can be difficult to relate back to the original voice signal.

4.2. Articulatory Features

These features provide insights into the movements and positioning of the articulatory organs, namely the tongue, lips and jaws, during speech. They offer an understanding of the physical aspects of speech sound production and clarity of speech.

4.2.1. Mel-Frequency Cepstral Coefficients

Cepstral analysis has long been a staple of speech analysis. It allows for the impulse response of the speech system to be isolated from the input and, in so doing, provides a spectral representation of the vocal tract. Mel-frequency cepstral coefficients are most often used to quantify this. They perform well in diagnostic classifiers and are frequently chosen features for this task [62,63,91,92]. However, quantitative comparisons between healthy and PD voices as well as longitudinal analyses are lacking.

4.2.2. Cepstral Peak Prominence

Cepstral peak prominence (CPP) is a feature used for evaluating overall voice quality. It is determined by measuring the height of the cepstrum peak in relation to the baseline levels. It has been widely used in the identification and analysis of speech dysphonia, and it has been investigated specifically for early detection of PD [70] and distinguishing between different PD subtypes [71]. CPP values have been found to be lower in pathological voices than in healthy voices [93], although to date, no long-term data for these values are available. It has also been noted that this feature may be susceptible to background noise present in recordings.

4.2.3. Bark Band Energy Features

Analysing the energy within specific frequency bands of speech signals provides a more detailed understanding of their characteristics. This analysis also aids in understanding how these sounds are perceived by listeners. For PD analysis, Bark band energies are often used. These measure the energy contained in each of 25 frequency bands. These bands are divided according to the Bark scale, which sees the frequency range of audible sound divided into perceptually relevant groups, with smaller bands at lower frequencies and larger bands at higher frequencies. The total amount of energy contained in each band is then calculated. Although they have been used in classifiers to distinguish between PD and healthy voices [72,73], no work has been conducted to compare the energy in each band quantitatively and thus identify where the differences lie and what they are.

4.2.4. Vowel Space Area

The vowel space represents the regions within the mouth where vowel sounds are articulated. In healthy voices, this space is distributed across the entire mouth and palate, allowing for a wide range of vowel sounds to be produced. In PD patients, this vowel space is often diminished, resulting in a more centralised production of vowel sounds. This reduction may be attributed to reduced control of the articulators, such as the tongue, which limits the ability to produce a wide range of vowel sounds.

4.2.5. Vowel Articulation Index

The vowel articulation index is a measure of vowel centralisation and is often studied together with the vowel space area. It is calculated by taking a ratio of the formant frequencies of the vowels /a/ and /i/ to the formant frequencies of the other vowels. It is considered to be robust to inter-speaker variability, thus providing a good general measure for vowel articulation. The values of this feature have been shown to be lower in pathological voices than in healthy voices and to further decrease over time [74,77].

4.2.6. Perceptual Linear Prediction Coefficients

Perceptual linear prediction coefficients provide a representation of the vocal tract in the cepstral domain. This makes them a good choice for tracking changes in articulation [63]. These measures have been used in studies seeking to automatically detect PD, but the differences between them and those calculated for healthy voices are not clarified.

4.3. Prosodic Features

These features describe the rhythmic, melodic and expressive aspects of speech. They convey meaning, intention and emotion through speech and extend across whole spoken sentences or passages.

4.3.1. Maximum Phonation Time

This feature indicates how long a person can sustain a phonation sound, such as a vowel. In PD voices, this is reported to be shorter than in healthy voices [83], possibly due to impaired breathing regulation. Exactly how much shorter is uncertain and variable between individual patients and different sounds.

4.3.2. Vocal Pitch Features

Features related to pitch, the voice’s fundamental frequency, quantify the variability in this frequency over a time segment as a measure of the patient’s ability to sustain this single frequency. These features are associated with vocal muscle control, which worsens with progressing PD. They are calculated from the pitch periods observed during sustained vowel phonations. As vocal control in PD patients declines over time and vocal fold oscillations become more erratic, these may become less regular, leading to a change in the fundamental frequency of the voice and increased variability or standard deviation [35,47,48].

Pitch variation may also be assessed over a longer spoken sentence. In this case, the changes to speech prosody are noted, and a reduced variability is observed [94,95,96,97]. This can result in the “monopitch” or “monotone” voice that is reported by both patients and observers of PD speech. It should be noted that these changes are more consistently reported in males than in females.

4.3.3. Speaking Rate

This feature represents the number of sounds a person can produce in a specified time. It is closely associated with the number and length of pauses contained in the spoken recording. It has been shown that the speaking rate is lower in PD patients than in healthy controls, potentially due to patients making a deliberate effort to control their speech [82]. However, to date, no correlation between this feature and PD severity has been identified, and further longitudinal assessments are required to evaluate long-term changes.

4.3.4. Pause Number and Length

The number and length of pauses in a recorded spoken sentence are used to assess the fluency of a patient’s speech. This also affects the rhythm and interpretability of speech. Increased pause times between words or between syllables of words may indicate both decreasing motor control of the speech organs and cognitive decline. This feature is expressed either as a number of pauses that are counted in running speech or a ratio of the total pause length to the total length of recorded speech. In Parkinsonian speech, these feature values tend to be higher than those in healthy speech.

4.3.5. Detrended Fluctuation Analysis

Detrended fluctuation analysis is a statistical parameter used to assess the self-similarity in long speech signals. This is performed by examining the variance in fluctuations over different lengths of a recorded speech segment and determining whether the observed fluctuations are correlated or random. In doing so, regular speaking rhythms can be determined, and unnatural or altered patterns can be identified. This feature may be used to describe the prosodic alterations observed in pathological speech and has been used in classifiers for that purpose. No quantitative data exist on establishing baseline values or distinguishing how pathological values differ from those of healthy ones.

4.4. Correlation Between Feature Changes and Progression Rating Scales

Several studies have examined the relationship between vocal feature values and severity scale scores, particularly regarding UPDRS and H&Y scales. Many different aspects of speech have been assessed for this purpose, including vowel articulation [74], syllable repetition stability [87], vocal acoustics during reading tasks and sustained vowels [46,47,48,53], prosodic features [84], speech rate and pause time [85,86] and frequency features.

Despite observing a decline in speech measures over time, the authors found no significant correlations with UPDRS or Hoehn and Yahr scores using statistical tests, such as Pearson correlation and Spearman rank calculations. One study did find a correlation between Hoehn and Yahr scores and perceptual acoustic evaluation but not with UPDRS [51].

The authors propose several possible reasons for the lack of correlation:

The UPDRS only includes one question related to speech and may not capture the subtle changes seen in vocal features.
The underlying pathophysiology of speech deterioration may differ from the motor symptoms assessed by the rating scales [74].
Vocal changes may not be affected by dopaminergic medications, which have the effect of stabilising the UPDRS scores [87].

4.5. Feature Impact on Classifier Model Performance

To evaluate the effectiveness of the vocal features described above, their impact on models for the diagnosis and monitoring of PD has been examined [28,51,55,61]. Figure 2 summarises the results of these studies and portrays the accuracy of support vector machine (SVM) classifiers in discriminating PD patients from healthy controls using different feature inputs. SVMs are considered here because they are the most frequently used classifiers for this task. The figure demonstrates that diagnostic accuracies above 75% are achieved in all cases, suggesting that vocal features can contribute to the early identification of PD.

It should be noted that the features are not used individually but rather combined into a feature vector, which serves as the input to the machine learning model. While the contribution of individual features is not the focus of this review, understanding their impact on the model’s final output can inform the decisions on which features to include. This feature selection further improves the explainability of the model.

4.6. Statistical Approaches in Vocal Feature Studies

The studies reviewed demonstrate several strengths. They consistently apply statistical analysis to compare feature values across different disease stages with healthy controls, highlighting significant differences. The high-performance metrics achieved when using these features in different classification algorithms underscore their utility and merit. Moreover, these works provide objective confirmation of subjectively observed changes, offering a solid foundation for developing a comprehensive acoustic model of vocal deterioration in PD.

However, a notable weakness in this body of literature is the reliance on snapshot measurements, where each patient is recorded at a single point in time. Feature comparison is thus performed on cohorts of patients for each disease stage. This limitation restricts the ability to conduct long-term tracking of individual patients, which is essential for verifying trends in feature values over time. Addressing this weakness would enhance the validity and predictive power of the findings and would contribute to the development of more robust models of disease progression.

The results outlined above highlight the progress made and the ongoing challenges in long-term PD voice analysis, which are discussed in the following section.

5. Discussion

This review identified key vocal features used in diagnosing and monitoring Parkinson’s disease and highlighted their changes as the disease progresses. The features examined included both traditional voice quality measures and non-linear characteristics. Classical measures like jitter and shimmer were consistently chosen due to their ease of calculation and explanation. Non-linear signal analysis enables the detection of subtle vocal changes, accommodating the inherent complexity of voice signals. The changes reported are also in line with those described in perceptual evaluations and in the speech section of the UPDRS assessment. Increased noise and signal irregularity relates to the hoarseness or breathy quality of Parkinsonian speech. Imprecise articulation and reduced variation in speaking rates and pitch relate to the monotonic quality, lack of emotion and reduced interpretability of Parkinsonian speech. This underscores the connection between subjective assessments and objective metrics. However, the physiological interpretation of complex non-linear features remains challenging.

Of the reviewed studies, 17 examined vocal changes for the purpose of diagnosing PD, whereas 10 explored monitoring or estimating disease severity. This focus on early diagnosis reflects its clinical importance, enabling earlier treatment, mitigating disease progression and improving patient outcomes. From a research perspective, early diagnostic studies are often more feasible, as vocal features tend to display greater discriminatory capabilities at the outset of the disease compared to the more subtle and diverse changes that emerge as the disease progresses. However, the significance of long-term monitoring, particularly in assessing treatment efficacy, should not be overlooked, especially as novel treatments are introduced. Positive results from studies estimating UPDRS scores or H&Y stages using voice recordings suggest a correlation between these measures, potentially leading to a comprehensive acoustic model of vocal deterioration. Further research is necessary to determine the specific nature of this correlation and to strengthen the utility of voice-based assessments in ongoing disease management.

It should be noted that most studies describe the observed changes in vocal features qualitatively (e.g., “increased” or “decreased”, “statistically significant” or “not statistically significant”). The lack of quantification limits their practical use. Further investigation could determine baseline values for both the disease in general and specific patients, providing more specific objective information regarding the changes and their link to disease progression.

This analysis could be extended to PD subtypes, which are categorised on clinical features. An initial model of vocal deterioration may be used to differentiate Parkinson’s disease from healthy controls. As the model is further refined, it could offer a more detailed examination of each subtype, providing a more nuanced understanding of their distinctive features. Additionally, the inclusion of speech assessments may assist in differential diagnoses between PD and other conditions that present with similar symptoms. These include progressive supranuclear palsy and dementia with Lewy bodies. Studies have shown that speech in dementia with Lewy bodies shows reduced emotional expression [98], while in progressive supranuclear palsy, speakers show reduced articulatory velocity and precision [99]. This suggests that, with further refinement, these tools could be beneficial in the differential diagnosis of these conditions.

Across the reviewed articles, combinations of vocal features, rather than single features in isolation, yielded the most effective diagnostic and monitoring results. This multifaceted approach allows for a comprehensive consideration of vocal changes. However, a notable gap remains in understanding how variations in one vocal feature may influence others. Exploring these interrelationships would enhance the interpretability of vocal features, especially more complex ones. Including metadata, like patient age, gender, time since diagnosis and medications, may allow for more refined models, especially given the current focus on developing patient-centred treatment plans. This could also facilitate the study of longitudinal changes in vocal features, which is notably lacking in many features.

This review revealed a scarcity of long-term analyses, reflecting a broader issue: a lack of accessible data. Most Parkinson’s disease patients are elderly (over 65 years) and may struggle to attend regular clinic visits due to mobility challenges or health issues. Advanced disease stages may also deter participation in studies. Consequently, the studies reviewed often relied on datasets of limited size and availability. Such datasets and a lack of comparative studies restrict the generalisability of the findings. This underscores the need for large-scale longitudinal studies regularly collecting data from the same patients over extended periods. This limitation is exacerbated by the lack of an established experimental protocol. A wide range of feature extraction methods and discrimination models were tested for PD diagnosis and severity estimation. The lack of consistency between them could make it difficult to reach consensus among researchers. Addressing these limitations is crucial to exploiting the promise of machine learning and deep learning models for healthcare applications. Large, consistent datasets for training and validation along with a well-defined experimental and reporting framework would ensure accurate, comparable and explainable results.

Despite these challenges, the substantial research on vocal changes in PD supports the use of voice as a non-invasive and effective biomarker for Parkinson’s disease. The progress in digital speech technologies offers the opportunity for a more objective assessment of voice and speech, enabling precise monitoring over time. The future of vocal evaluation in Parkinson’s disease (PD) includes the integration of these assessments with other PD scales, such as motor and cognitive evaluations, to provide a more comprehensive and patient-centred understanding of disease progression. This holistic approach allows for the development of personalised treatment plans, including pharmaceutical and surgical treatments, as well as speech therapy, which is commonly prescribed to address vocal challenges in PD. This therapy can be efficiently and quantitatively evaluated using digital tools. Moreover, its effects could be assessed not only in the context of improving the patients’ communication abilities but also in potentially slowing cognitive decline and alleviating other disease symptoms. This work fills a critical gap in the existing literature, and integrating these findings into severity prediction models could enhance both their performance and interpretability.

While every effort has been made to ensure a comprehensive review, several limitations exist. For instance, this work examined a limited number of rating scales. The decision to focus on the UPDRS and H&Y scales was based on their widespread use in both research and clinical practice. However, this choice resulted in the exclusion of other rating scales. Including these could provide additional insights into how vocal features change when disease progression is measured using alternative metrics. The decision to exclude perceptual or self-assessments was due to their subjective nature; however, comparing these with objective evaluations of vocal characteristics could yield valuable insights into the psychological effects of the speech changes. Similarly, neurological assessments were not included but could provide additional context for the role of vocal changes in PD progression.

This review only considered changes to voice and speech as PD worsens over time and did not consider the effects of medications or other therapeutic interventions. Thus, potential improvements were not discussed. This represents an additional dimension to vocal changes needing inclusion in any complete model of the effects of PD.

Despite these limitations, this review provides a foundational overview of current research on vocal features as PD biomarkers. Future studies should aim to address them by incorporating additional rating scales and evaluating the effects of medications and therapies.

6. Conclusions

Vocal features extracted from speech recordings have emerged as promising, non-invasive biomarkers for PD. This review details how vocal features used in PD monitoring change as the disease progresses. Despite several existing gaps, particularly concerning long-term vocal changes, current insights provide a foundation to predict vocal deterioration in PD patients. These advancements promise more accurate and explainable monitoring methodologies, improving diagnosis and patient care.

Future research should prioritise longitudinal studies capturing long-term vocal changes. Such investigations would not only refine our understanding of how PD affects speech but also provide critical data needed to develop predictive models capable of tracking disease progression accurately. Furthermore, integrating voice-based biomarkers into clinical practice could offer practical benefits, such as early detection, improved treatment planning and enhanced patient monitoring, without the need for invasive procedures.

However, ongoing challenges remain significant. Variability in study methodologies and data quality can complicate efforts to standardise voice-based assessments across different populations or settings. Ensuring accessibility and usability requires collaboration between researchers, clinicians and technology developers to create user-friendly tools for diverse healthcare systems.

Overcoming these challenges through continued research will be crucial to realising the potential of voice-based biomarkers in managing Parkinson’s disease effectively. By fostering interdisciplinary collaboration and investing in further studies on long-term changes in vocal features associated with PD progression, we can move closer to developing robust diagnostic tools that improve patient outcomes while advancing our understanding of this complex neurological disorder.

Author Contributions

Conceptualisation, H.W. and V.A.; methodology, H.W.; data collection and review, H.W.; writing—original draft preparation, H.W.; writing—review and editing, V.A. and H.W.; supervision, V.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alster, P.; Otto-Ślusarczyk, D.; Migda, A.; Migda, B.; Struga, M.; Friedman, A. Role of orexin in pathogenesis of neurodegenerative parkinsonisms. Neurol. Neurochir. Pol. 2023, 57, 335–343. [Google Scholar] [CrossRef] [PubMed]
Rosca, E.C.; Simu, M. Parkinson’s disease-cognitive rating scale for evaluating cognitive impairment in Parkinson’s disease: A systematic review. Brain Sci. 2020, 10, 588. [Google Scholar] [CrossRef] [PubMed]
Ramaker, C.; Marinus, J.; Stiggelbout, A.M.; van Hilten, B.J. Systematic evaluation of rating scales for impairment and disability in Parkinson’s disease. Mov. Disord. 2002, 17, 867–876. [Google Scholar] [CrossRef]
Rissardo, J.P.; Caprara, A.L.F. Parkinson’s disease rating scales: A literature review. Ann. Mov. Disord. 2020, 3, 3–22. [Google Scholar] [CrossRef]
Torbey, E.; Pachana, N.A.; Dissanayaka, N.N.W. Depression rating scales in Parkinson’s disease: A critical review updating recent literature. J. Affect. Disord. 2015, 184, 216–224. [Google Scholar] [CrossRef] [PubMed]
Skorvanek, M.; Goldman, J.G.; Jahanshahi, M.; Marras, C.; Rektorova, I.; Schmand, B.; van Duijn, E.; Goetz, C.G.; Weintraub, D.; Stebbins, G.T.; et al. Global scales for cognitive screening in Parkinson’s disease: Critique and recommendations. Mov. Disord. 2018, 33, 208–218. [Google Scholar] [CrossRef]
Chaudhuri, K.R.; Odin, P. The challenge of non-motor symptoms in Parkinson’s disease. Prog. Brain Res. 2010, 184, 325–341. [Google Scholar] [CrossRef]
Kenny, L.; Azizi, Z.; Moore, K.; Alcock, M.; Heywood, S.; Jonsson, A.; McGrath, K.; Foley, M.J.; Sweeney, B.; O’Sullivan, S.; et al. Inter-rater reliability of hand motor function assessment in Parkinson’s disease: Impact of clinician training. Clin. Park. Relat. Disord. 2024, 11, 100278. [Google Scholar] [CrossRef]
Luiz, L.M.D.; Marques, I.A.; Folador, J.P.; Andrade, A.O. Intra and inter-rater remote assessment of bradykinesia in Parkinson’s disease. Neurologia 2024, 39, 345–352. [Google Scholar] [CrossRef]
Li, T.; Le, W. Biomarkers for Parkinson’s Disease: How Good Are They? Neurosci. Bull. 2020, 36, 183–194. [Google Scholar] [CrossRef]
Ho, A.K.; Iansek, R.; Marigliani, C.; Bradshaw, J.L.; Gates, S. Speech impairment in a large sample of patients with Parkinson’s disease. Behav. Neurol. 1999, 11, 131–137. [Google Scholar] [CrossRef] [PubMed]
Harel, B.; Snyder, P.J.; Cannizzaro, M. Variability in fundamental frequency during speech in prodromal and incipient Parkinson’s disease: A longitudinal case study. Brain Cogn. 2004, 56, 24–29. [Google Scholar] [CrossRef]
Olanow, C.W.; Stern, M.B.; Sethi, K. The scientific and clinical basis for the treatment of Parkinson disease. Neurology 2009, 72, S1–S136. [Google Scholar] [CrossRef]
Despotovic, V.; Skovranek, T.; Schommer, C. Speech Based Estimation of Parkinson’s Disease Using Gaussian Processes and Automatic Relevance Determination. Neurocomputing 2020, 401, 173–181. [Google Scholar] [CrossRef]
Khalil, R.A.; Jones, E.; Babar, M.I.; Jan, T.; Zafar, M.H.; Alhussain, T. Speech Emotion Recognition Using Deep Learning Techniques: A Review. IEEE Access 2019, 7, 117327–117345. [Google Scholar] [CrossRef]
Low, D.M.; Bentley, K.H.; Ghosh, S.S. Automated assessment of psychiatric disorders using speech: A systematic review. Laryngoscope Investig. Otolaryngol. 2020, 5, 96–116. [Google Scholar] [CrossRef] [PubMed]
Kim, H.; Jeon, J.; Han, Y.J.; Joo, Y.; Lee, J.; Lee, S.; Im, S. Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy. J. Clin. Med. 2020, 9, 3415. [Google Scholar] [CrossRef]
Aharonson, V.; De Nooy, A.; Bulkin, S.; Sessel, G. Automated Classification of Depression Severity Using Speech-A Comparison of Two Machine Learning Architectures. In Proceedings of the 2020 IEEE International Conference on Healthcare Informatics (ICHI), Oldenburg, Germany, 30 November–3 December 2020. [Google Scholar] [CrossRef]
Seedat, N.; Aharonson, V.; Hamzany, Y. Automated and Interpretable m-Health Discrimination of Vocal Cord Pathology Enabled by Machine Learning. In Proceedings of the 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Gold Coast, QLD, Australia, 16–18 December 2020. [Google Scholar] [CrossRef]
Batliner, A.; Schuller, B.; Seppi, D.; Steidl, S.; Devillers, L.; Vidrascu, L.; Vogt, T.; Aharonson, V.; Amir, N. The Automatic Recognition of Emotions in Speech. In Emotion-Oriented Systems; Petta, P., Cowie, R., Pelachaud, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 71–99. [Google Scholar] [CrossRef]
Pinkas, G.; Karny, Y.; Malachi, A.; Barkai, G.; Bachar, G.; Aharonson, V. SARS-CoV-2 Detection from Voice. IEEE Open J. Eng. Med. Biol. 2020, 1, 268–274. [Google Scholar] [CrossRef]
Barsties, B.; De Bodt, M. Assessment of voice quality: Current state-of-the-art. Auris Nasus Larynx 2015, 42, 183–188. [Google Scholar] [CrossRef]
Maryn, Y.; Roy, N.; De Bodt, M.; Van Cauwenberge, P.; Corthals, P. Acoustic measurement of overall voice quality: A meta-analysis. J. Acoust. Soc. Am. 2009, 126, 2619–2634. [Google Scholar] [CrossRef]
Sidhu, M.S.; Latib, N.A.A.; Sidhu, K.K. MFCC in audio signal processing for voice disorder: A review. Multimed. Tools Appl. 2024, 1–21. [Google Scholar] [CrossRef]
Arora, S.; Tsanas, A. Assessing parkinson’s disease at scale using telephone-recorded speech: Insights from the Parkinson’s voice initiative. Diagnostics 2021, 11, 1892. [Google Scholar] [CrossRef] [PubMed]
Rusz, J.; Cmejla, R.; Ruzickova, H.; Klempír, J.; Majerova, V.; Picmausova, J.; Roth, J.; Ruzicka, E. Acoustic Analysis of Voice and Speech Characteristics in Early Untreated Parkinson’s Disease. In Proceedings of the 7th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2011, Florence, Italy, 25–27 August 2011. [Google Scholar]
Moro-Velazquez, L.; Gomez-Garcia, J.A.; Arias-Londoño, J.D.; Dehak, N.; Godino-Llorente, J.I. Advances in Parkinson’s Disease detection and assessment using voice and speech: A review of the articulatory and phonatory aspects. Biomed. Signal Process Control 2021, 66, 102418. [Google Scholar] [CrossRef]
Ngo, Q.C.; Motin, M.A.; Pah, N.D.; Drotár, P.; Kempster, P.; Kumar, D. Computerized analysis of speech and voice for Parkinson’s disease: A systematic review. Comput. Methods Programs Biomed. 2022, 226, 107133. [Google Scholar] [CrossRef]
Amato, F.; Saggio, G.; Cesarini, V.; Olmo, G.; Costantini, G. Machine learning- and statistical-based voice analysis of Parkinson’s disease patients: A survey. Expert Syst. Appl. 2023, 219, 119651. [Google Scholar] [CrossRef]
Jones, H.N. Prosody in Parkinson’s Disease. Perspect. Neurophysiol. Neurogenic Speech Lang. Disord. 2009, 19, 77–82. [Google Scholar] [CrossRef]
Moro-Velazquez, L.; Dehak, N. A Review of the Use of Prosodic Aspects of Speech for the Automatic Detection and Assessment of Parkinson’s Disease. In Automatic Assessment of Parkinsonian Speech; Godino-Llorente, J.I., Ed.; Springer: Cham, Switzerland, 2020; pp. 42–59. [Google Scholar] [CrossRef]
Narendra, N.P.; Schuller, B.; Alku, P. The Detection of Parkinson’s Disease from Speech Using Voice Source Information. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 1925–1936. [Google Scholar] [CrossRef]
Pinto, S.; Ozsancak, C.; Tripoliti, E.; Thobois, S.; Limousin-Dowsey, P.; Auzou, P. Treatments for dysarthria in Parkinson’s disease. Lancet Neurol. 2004, 3, 547–556. [Google Scholar] [CrossRef]
Holmes, R.J.; Oates, J.M.; Phyland, D.J.; Hughes, A.J. Voice characteristics in the progression of Parkinson’s disease. Int. J. Lang. Commun. Disord. 2000, 35, 407–418. [Google Scholar] [CrossRef]
Midi, I.; Dogan, M.; Koseoglu, M.; Can, G.; Sehitoglu, M.A.; Gunal, D.I. Voice abnormalities and their relation with motor dysfunction in Parkinson’s disease. Acta Neurol. Scand. 2008, 117, 26–34. [Google Scholar] [CrossRef]
Hemmerling, D.; Wójcik-Pȩdziwiatr, M.; Jaciów, P.; Ziółko, B.; Igras-Cybulska, M. Monitoring of Parkinson’s Disease Progression Based on Speech Signal. In Proceedings of the 6th International Conference on Information and Computer Technologies (ICICT), Raleigh, NC, USA, 24–26 March 2023. [Google Scholar] [CrossRef]
Hlavica, J.; Prauzek, M.; Peterek, T.; Musilek, P. Assessment of Parkinson’s disease progression using neural network and ANFIS models. Neural Netw. World 2016, 26, 111–128. [Google Scholar] [CrossRef]
Nilashi, M.; Ibrahim, O.; Ahmadi, H.; Shahmoradi, L.; Farahmand, M. A hybrid intelligent system for the prediction of Parkinson’s Disease progression using machine learning techniques. Biocybern. Biomed. Eng. 2018, 38, 1–15. [Google Scholar] [CrossRef]
Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Enhanced Classical Dysphonia Measures and Sparse Regression for Telemonitoring of Parkinson’s Disease Progression. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010. [Google Scholar] [CrossRef]
Bárcenas, R.; Fuentes-García, R.; Naranjo, L. Mixed kernel SVR addressing Parkinson’s progression from voice features. PLoS ONE 2022, 17, e0275721. [Google Scholar] [CrossRef]
Bayestehtashk, A.; Asgari, M.; Shafran, I.; McNames, J. Fully automated assessment of the severity of Parkinson’s disease from speech. Comput. Speech Lang. 2015, 29, 172–185. [Google Scholar] [CrossRef]
Nilashi, M.; Ibrahim, O.; Ahani, A. Accuracy Improvement for Predicting Parkinson’s Disease Progression. Sci. Rep. 2016, 6, 34181. [Google Scholar] [CrossRef] [PubMed]
Nilashi, M.; Ibrahim, O.; Samad, S.; Ahmadi, H.; Shahmoradi, L.; Akbari, E. An analytical method for measuring the Parkinson’s disease progression: A case on a Parkinson’s telemonitoring dataset. Measurement 2019, 136, 545–557. [Google Scholar] [CrossRef]
Silbergleit, A.K.; LeWitt, P.A.; Peterson, E.L.; Gardner, G.M. Quantitative analysis of voice in Parkinson disease compared to motor performance: A pilot study. J. Park. Dis. 2015, 5, 517–524. [Google Scholar] [CrossRef]
Perez, C.; Roca, Y.C.; Naranjo, L.; Martin, J. Diagnosis and Tracking of Parkinson’s Disease by using Automatically Extracted Acoustic Features. J. Alzheimers Dis. Park. 2016, 260, 2160-0460. [Google Scholar] [CrossRef]
Gamboa, J.; Jiménez-Jiménez, F.J.; Nieto, A.; Montojo, J.; Ortí-Pareja, M.; Molina, J.A.; García-Albea, E.; Cobeta, I. Acoustic Voice Analysis in Patients with Parkinson’s Disease Treated with Dopaminergic Drugs. J. Voice 1997, 11, 314–320. [Google Scholar] [CrossRef]
Majdinasab, F.; Karkheiran, S.; Soltani, M.; Moradi, N.; Shahidi, G. Relationship Between Voice and Motor Disabilities of Parkinson’s Disease. J. Voice 2016, 30, e17–e768. [Google Scholar] [CrossRef]
Tanaka, Y.; Nishio, M.; Niimi, S. Vocal acoustic characteristics of patients with Parkinson’s disease. Folia Phoniatr. Logop. 2011, 63, 223–230. [Google Scholar] [CrossRef]
Jiménez-Jiménez, F.J.; Gamboa, J.; Nieto, A.; Guerrero, J.; Orti-Pareja, M.; Molina, J.A.; García-Albea, E.; Cobeta, I. Acoustic Voice Analysis in Untreated Patients with Parkinson’s Disease. Park. Relat. Disord. 1997, 3, 111–116. [Google Scholar] [CrossRef]
Rani, K.U.; Holi, M.S. Analysis of Speech Characteristics of Neurological Diseases and Their Classification. In Proceedings of the 2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT’12), Coimbatore, India, 26–28 July 2012. [Google Scholar] [CrossRef]
Pah, N.D.; Motin, M.A.; Kumar, D.K. Phonemes based detection of Parkinson’s disease for telehealth applications. Sci. Rep. 2022, 12, 9687. [Google Scholar] [CrossRef]
Rusz, J.; Cmejla, R.; Ruzickova, H.; Ruzicka, E. Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease. J. Acoust. Soc. Am. 2011, 129, 350–367. [Google Scholar] [CrossRef] [PubMed]
Skodda, S.; Grönheit, W.; Mancinelli, N.; Schlegel, U. Progression of voice and speech impairment in the course of Parkinson’s disease: A longitudinal study. Park. Dis. 2013, 2013, 389195. [Google Scholar] [CrossRef] [PubMed]
Rusz, J.; Tykalová, T.; Novotný, M.; Růžička, E.; Dušek, P. Distinct patterns of speech disorder in early-onset and late-onset de-novo Parkinson’s disease. npj Park. Dis. 2021, 7, 98. [Google Scholar] [CrossRef]
Tsanas, A.; Little, M.A.; McSharry, P.E.; Spielman, J.; Ramig, L.O. Novel speech signal processing algorithms for high-accuracy classification of Parkinsons disease. IEEE Trans. Biomed. Eng. 2012, 59, 1264–1271. [Google Scholar] [CrossRef] [PubMed]
Galaz, Z.; Mekyska, J.; Zvoncak, V.; Mucha, J.; Kiska, T.; Smekal, Z.; Eliasova, I.; Mrackova, M.; Kostalova, M.; Rektorova, I.; et al. Changes in phonation and their relations with progress of Parkinson’s disease. Appl. Sci. 2018, 8, 2339. [Google Scholar] [CrossRef]
Viswanathan, R.; Arjunan, S.P.; Bingham, A.; Jelfs, B.; Kempster, P.; Raghav, S.; Kumar, D.K. Complexity measures of voice recordings as a discriminative tool for Parkinson’s disease. Biosensors 2019, 10, 1. [Google Scholar] [CrossRef]
Rahn, D.A.; Chou, M.; Jiang, J.J.; Zhang, Y. Phonatory Impairment in Parkinson’s Disease: Evidence from Nonlinear Dynamic Analysis and Perturbation Analysis. J. Voice 2007, 21, 64–71. [Google Scholar] [CrossRef]
Naranjo, L.; Pérez, C.J.; Campos-Roca, Y. Monitoring Parkinson’s disease progression based on recorded speech with missing ordinal responses and replicated covariates. Comput. Biol. Med. 2021, 134, 104503. [Google Scholar] [CrossRef]
Tsanas, A.; Little, M.; McSharry, P.; Ramig, L. Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests. IEEE Trans. Biomed. Eng. 2010, 57, 884–893. [Google Scholar] [CrossRef] [PubMed]
Little, M.A.; McSharry, P.E.; Hunter, E.J.; Spielman, J.; Ramig, L.O. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 2009, 56, 1015–1022. [Google Scholar] [CrossRef] [PubMed]
Benba, A.; Hammouch, A.; Jilbab, A. Detecting Patients with Parkinson’s disease using Mel Frequency Cepstral Coefficients and Support Vector Machines. Int. J. Electr. Eng. Inform. 2015, 7, 308–322. [Google Scholar] [CrossRef]
Moro-Velázquez, L.; Gómez-García, J.A.; Godino-Llorente, J.I.; Villalba, J.; Orozco-Arroyave, J.R.; Dehak, N. Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson’s Disease. Appl. Soft Comput. 2018, 62, 649–666. [Google Scholar] [CrossRef]
Upadhya, S.S.; Cheeran, A.N.; Nirmal, J.H. Thomson Multitaper MFCC and PLP voice features for early detection of Parkinson disease. Biomed. Signal Process Control 2018, 46, 293–301. [Google Scholar] [CrossRef]
Khan, T.; Lundgren, L.E.; Anderson, D.G.; Nowak, I.; Dougherty, M.; Verikas, A.; Pavel, M.; Jimison, H.; Nowaczyk, S.; Aharonson, V. Assessing Parkinson’s disease severity using speech analysis in non-native speakers. Comput. Speech Lang. 2020, 61, 101047. [Google Scholar] [CrossRef]
Oung, Q.W.; Basah, S.N.; Muthusamy, H.; Vijean, V.; Lee, H. Evaluation of Short-Term Cepstral Based Features for Detection of Parkinson’s Disease Severity Levels through Speech Signals. IOP Conf. Ser. Mater. Sci. Eng. 2018, 318, 012039. [Google Scholar] [CrossRef]
Liu, Y.; Reddy, M.K.; Penttila, N.; Ihalainen, T.; Alku, P.; Rasanen, O. Automatic Assessment of Parkinson’s Disease Using Speech Representations of Phonation and Articulation. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 31, 242–255. [Google Scholar] [CrossRef]
Rueda, A.; Krishnan, S. Feature Analysis of Dysphonia SPEECH for Monitoring Parkinson’s Disease. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju, Republic of Korea, 11–15 July 2017. [Google Scholar] [CrossRef]
Kodali, M.; Kadiri, S.R.; Alku, P. Automatic classification of the severity level of Parkinson’s disease: A comparison of speaking tasks, features, and classifiers. Comput. Speech Lang. 2023, 83, 101548. [Google Scholar] [CrossRef]
Šimek, M.; Rusz, J. Validation of cepstral peak prominence in assessing early voice changes of Parkinson’s disease: Effect of speaking task and ambient noise. J. Acoust. Soc. Am. 2021, 150, 4522–4533. [Google Scholar] [CrossRef] [PubMed]
Burk, B.R.; Watts, C.R. The Effect of Parkinson Disease Tremor Phenotype on Cepstral Peak Prominence and Transglottal Airflow in Vowels and Speech. J. Voice 2019, 33, e11–e580. [Google Scholar] [CrossRef]
Vásquez-Correa, J.C.; Orozco-Arroyave, J.R.; Nöth, E. Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association (ISCA), Stockholm, Sweden, 20–24 August 2017. [Google Scholar] [CrossRef]
Orozco-Arroyave, J.R.; Hönig, F.; Arias-Londoño, J.D.; Vargas-Bonilla, J.F.; Skodda, S.; Rusz, J.; Nöth, E. Voiced/Unvoiced Transitions in Speech as a Potential Bio-Marker to Detect Parkinson’s Disease. In Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, Dresden, Germany, 6–10 September 2015. [Google Scholar]
Skodda, S.; Visser, W.; Schlegel, U. Vowel articulation in Parkinson’s disease. J. Voice 2011, 25, 467–472. [Google Scholar] [CrossRef] [PubMed]
Roland, V.; Huet, K.; Harmegnies, B.; Piccaluga, M.; Verhaegen, C.; Delvaux, V. Vowel production: A potential speech biomarker for early detection of dysarthria in Parkinson’s disease. Front. Psychol. 2023, 14, 1129830. [Google Scholar] [CrossRef]
Skodda, S.; Grönheit, W.; Schlegel, U. Impairment of vowel articulation as a possible marker of disease progression in Parkinson’s disease. PLoS ONE 2012, 7, e.32132. [Google Scholar] [CrossRef]
Rusz, J.; Cmejla, R.; Tykalova, T.; Ruzickova, H.; Klempir, J.; Majerova, V.; Picmausova, J.; Roth, J.; Ruzicka, E. Imprecise vowel articulation as a potential early marker of Parkinson’s disease: Effect of speaking task. J. Acoust. Soc. Am. 2013, 134, 2171–2181. [Google Scholar] [CrossRef]
Whitfield, J.A.; Goberman, A.M. Articulatory-acoustic vowel space: Application to clear speech in individuals with Parkinson’s disease. J. Commun. Disord. 2014, 51, 19–28. [Google Scholar] [CrossRef] [PubMed]
Mirarchi, D.; Vizza, P.; Tradigo, G.; Lombardo, N.; Arabia, G.; Veltri, P. Signal Analysis for Voice Evaluation in Parkinson’s Disease. In Proceedings of the 2017 IEEE International Conference on Healthcare Informatics (ICHI), Park City, UT, USA, 23–26 August 2017. [Google Scholar] [CrossRef]
Hemmerling, D.; Wojcik-Pedziwiatr, M. Prediction and Estimation of Parkinson’s Disease Severity Based on Voice Signal. J. Voice 2022, 36, e9–e439. [Google Scholar] [CrossRef]
Moro-Velazquez, L.; Gomez-Garcia, J.A.; Godino-Llorente, J.I.; Grandas-Perez, F.; Shattuck-Hufnagel, S.; Yagüe-Jimenez, V.; Dehak, N. Phonetic relevance and phonemic grouping of speech in the automatic detection of Parkinson’s Disease. Sci. Rep. 2019, 9, 19066. [Google Scholar] [CrossRef]
Benba, A.; Jilbab, A.; Hammouch, A. Discriminating Between Patients with Parkinson’s and Neurological Diseases Using Cepstral Analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 2016, 24, 1100–1108. [Google Scholar] [CrossRef]
Yücetürk, A.; YIlmaz, H.; Eǧrilmez, M.; Karaca, S. Voice analysis and videolaryngostroboscopy in patients with Parkinson’s disease. Eur. Arch. Oto-Rhino-Laryngol. 2002, 259, 290–293. [Google Scholar] [CrossRef]
Skodda, S.; Rinsche, H.; Schlegel, U. Progression of dysprosody in Parkinson’s disease over time—A longitudinal study. Mov. Disord. 2009, 24, 716–722. [Google Scholar] [CrossRef] [PubMed]
Martínez-Sánchez, F.; Meilán, J.; Carro, J.; Íñiguez, C.G.; Millian-Morell, L.; Valverde, I.P.; López-Alburquerque, T.; López, D. Speech rate in Parkinson’s disease: A controlled study. Neurologia 2016, 31, 466–472. [Google Scholar] [CrossRef] [PubMed]
Skodda, S.; Visser, W.; Schlegel, U. Gender-related patterns of dysprosody in Parkinson disease and correlation between speech variables and motor symptoms. J. Voice 2011, 25, 76. [Google Scholar] [CrossRef]
Skodda, S.; Schlegel, U. Speech rate and rhythm in Parkinson’s disease. Mov. Disord. 2008, 23, 985–992. [Google Scholar] [CrossRef]
Skodda, S.; Flasskamp, A.; Schlegel, U. Instability of syllable repetition as a marker of disease progression in Parkinson’s disease: A longitudinal study. Mov. Disord. 2011, 26, 59–64. [Google Scholar] [CrossRef] [PubMed]
Little, M.A.; McSharry, P.E.; Roberts, S.J.; Costello, D.A.E.; Moroz, I.M. Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Biomed. Eng. Online 2007, 6, 23. [Google Scholar] [CrossRef]
Little, M.A.; Costello, D.A.E.; Harries, M.L. Objective dysphonia quantification in vocal fold paralysis: Comparing nonlinear with classical measures. J. Voice 2011, 25, 21–31. [Google Scholar] [CrossRef] [PubMed]
Jeancolas, L.; Benali, H.; Benekelfat, B.-E.; Mangone, G.; Corvol, J.-C.; Vidailhet, M.; Lehercy, S.; Petrovska-Delacrétaz, D. Automatic Detection of Early Stages of Parkinson’s Disease through Acoustic Voice Analysis with Mel-Frequency Cepstral Coefficients. In Proceedings of the 2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Fez, Morocco, 22–24 May 2017. [Google Scholar] [CrossRef]
Hawi, S.; Alhozami, J.; AlQahtani, R.; AlSafran, D.; Alqarni, M.; El Sahmarany, L. Automatic Parkinson’s disease detection based on the combination of long-term acoustic features and Mel frequency cepstral coefficients (MFCC). Biomed. Signal Process Control 2022, 78, 104013. [Google Scholar] [CrossRef]
Murton, O.; Hillman, R.; Mehta, D. Cepstral peak prominence values for clinical voice evaluation. Am. J. Speech Lang. Pathol. 2020, 29, 1596–1607. [Google Scholar] [CrossRef]
Abur, D.; Lester-Smith, R.A.; Daliri, A.; Lupiani, A.A.; Guenther, F.H.; Stepp, C.E. Sensorimotor adaptation of voice fundamental frequency in Parkinson’s disease. PLoS ONE 2018, 13, e0191839. [Google Scholar] [CrossRef] [PubMed]
Mollaei, F.; Shiller, D.M.; Baum, S.R.; Gracco, V.L. Sensorimotor control of vocal pitch and formant frequencies in Parkinson’s disease. Brain Res. 2016, 1646, 269–277. [Google Scholar] [CrossRef] [PubMed]
Harel, B.T.; Cannizzaro, M.S.; Cohen, H.; Reilly, N.; Snyder, P.J. Acoustic characteristics of Parkinsonian speech: A potential biomarker of early disease progression and treatment. J. Neurolinguistics 2004, 17, 439. [Google Scholar] [CrossRef]
Skodda, S.; Grönheit, W.; Schlegel, U. Intonation and speech rate in Parkinson’s disease: General and dynamic aspects and responsiveness to levodopa admission. J. Voice 2011, 25, 199. [Google Scholar] [CrossRef] [PubMed]
Kobayashi, M.; Yamada, Y.; Shinkawa, K.; Nemoto, M.; Ota, M.; Nemoto, K.; Arai, T. Speech and language characteristics differentiate Alzheimer’s disease and dementia with Lewy bodies. Alzheimer’s Dement. Diagn. Assess. Dis. Monit. 2022, 14, e12364. [Google Scholar] [CrossRef]
Skodda, S.; Visser, W.; Schlegel, U. Acoustical analysis of speech in progressive supranuclear palsy. J. Voice 2011, 25, 725–731. [Google Scholar] [CrossRef]

Figure 1. PRISMA review structure implemented for this review.

Figure 2. The accuracy of a support vector machine model in Parkinson’s disease prediction using different vocal features, based on results from [28,51,55,61].

Table 1. Vocal feature descriptions, variations and applications.

Feature Name	Description	Physiological Interpretation	Application	Changes Reported	References
Jitter	Variation in period of glottal pulses	Irregular vocal fold oscillations	Diagnosis; Monitoring	Higher in PD than healthy	[14,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51]
Shimmer	Variation in amplitudes of glottal pulses	Irregular vocal fold oscillations	Diagnosis; Monitoring	Higher in PD than healthy	[14,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53]
Harmonics-to-Noise Ratio	Ratio between the energy of harmonic content and the energy of the noise content of a signal	Incomplete closure of the vocal folds	Diagnosis; Monitoring	Lower in PD than in healthy	[14,34,35,36,37,39,40,42,43,45,46,47,48,50,51,52,53,54]
Glottal-to-Noise Excitation Ratio	Measure of the contribution of vocal fold oscillations to the voice signal compared to the contribution of noise	Irregular vocal fold oscillations or incomplete closure of the vocal folds	Diagnosis	None recorded	[55,56,57]
Correlation Dimension	Instability of the vocal signal	Irregular vocal fold oscillations	Diagnosis	Higher in PD than in healthy	[58,59]
Pitch Period Entropy	Measure of the variation between pitch periods	Variation in the frequency of vibration of the vocal folds	Diagnosis; Monitoring	Higher in PD than in healthy	[14,36,37,40,42,43,55,60,61]
Recurrent Period Density Entropy	Measure of the complexity of the vocal signal	Irregular vocal fold oscillations	Diagnosis; Monitoring	None recorded	[14,36,37,40,42,43,55,60,61]
Mel-frequency Cepstral Coefficients	Power density spectrum of speech, presented on a perceptually relevant frequency scale	Spectral representation of the vocal tract	Diagnosis; Monitoring	None recorded	[38,45,62,63,64,65,66,67,68,69]
Cepstral Peak Prominence	Height of the cepstral peak in the total cepstrum of a voice signal	Indicator of voice quality	Diagnosis	Lower in PD than in healthy	[54,59,70,71]
Bark Band Energy Features	Measure of the energy contained in each of 25 perceptually relevant frequency bands	Differentiation between voiced and unvoiced segments of speech	Diagnosis	None recorded	[72,73]
Vowel Space Area	A measure of how distinctly different vowel sounds can be produced	Imprecise movements of the articulator organs	Diagnosis; Monitoring	Lower in PD than in healthy	[74,75,76,77,78,79]
Vowel Articulation Index	A measure of where in the mouth vowel sounds are produced	Imprecise movements of the articulator organs	Diagnosis	Lower in PD than in healthy	[74,75,76,77]
Perceptual Linear Prediction Coefficients	Spectral envelope of a speech signal with frequency axis adjusted to the Bark scale	Perceptual representation of the vocal tract	Diagnosis	None recorded	[65,80,81,82]
Maximum Phonation Time	Time that a vowel phonation can be sustained	Illustrates breathing capacity and control	Diagnosis	Shorter in PD than in healthy	[54,83]
Fundamental Frequency Variability	Vocal pitch and its variability	Frequency of the vibration of the vocal folds	Diagnosis; Monitoring	Increased variability over short vocal segments in PD as compared to healthy	[34,35,38,46,47,48,53,54,65,74,84]
Speaking rate	The number of speech sounds produced in a given time	Identification of altered speech patterns	Diagnosis; Monitoring	Lower in PD than in healthy	[53,84,85,86]
Number and Length of Pauses (Pause ratio)	Number and duration of pauses taken during speaking, both between and within words	Identification of altered speech patterns	Monitoring	Higher in PD than in healthy	[53,54,65,84,85,87,88]
Detrended Fluctuation Analysis	Measure of self-similarity and pattern identification in speech signals	Identification of altered speech patterns	Diagnosis	None recorded	[14,36,37,40,42,43,55,60,61]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wright, H.; Aharonson, V. Vocal Feature Changes for Monitoring Parkinson’s Disease Progression—A Systematic Review. Brain Sci. 2025, 15, 320. https://doi.org/10.3390/brainsci15030320

AMA Style

Wright H, Aharonson V. Vocal Feature Changes for Monitoring Parkinson’s Disease Progression—A Systematic Review. Brain Sciences. 2025; 15(3):320. https://doi.org/10.3390/brainsci15030320

Chicago/Turabian Style

Wright, Helen, and Vered Aharonson. 2025. "Vocal Feature Changes for Monitoring Parkinson’s Disease Progression—A Systematic Review" Brain Sciences 15, no. 3: 320. https://doi.org/10.3390/brainsci15030320

APA Style

Wright, H., & Aharonson, V. (2025). Vocal Feature Changes for Monitoring Parkinson’s Disease Progression—A Systematic Review. Brain Sciences, 15(3), 320. https://doi.org/10.3390/brainsci15030320

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Vocal Feature Changes for Monitoring Parkinson’s Disease Progression—A Systematic Review

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Study Selection

3.2. Data Extraction

3.3. Search Limitations

4. Results

4.1. Phonatory Features

4.1.1. Jitter

4.1.2. Shimmer

4.1.3. Harmonics-to-Noise Ratio

4.1.4. Glottal-to-Noise Excitation Ratio

4.1.5. Correlation Dimension (D2)

4.1.6. Pitch Period Entropy

4.1.7. Recurrent Period Density Entropy

4.2. Articulatory Features

4.2.1. Mel-Frequency Cepstral Coefficients

4.2.2. Cepstral Peak Prominence

4.2.3. Bark Band Energy Features

4.2.4. Vowel Space Area

4.2.5. Vowel Articulation Index

4.2.6. Perceptual Linear Prediction Coefficients

4.3. Prosodic Features

4.3.1. Maximum Phonation Time

4.3.2. Vocal Pitch Features

4.3.3. Speaking Rate

4.3.4. Pause Number and Length

4.3.5. Detrended Fluctuation Analysis

4.4. Correlation Between Feature Changes and Progression Rating Scales

4.5. Feature Impact on Classifier Model Performance

4.6. Statistical Approaches in Vocal Feature Studies

5. Discussion

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI