Classification of Dysphonic Voices in Parkinson’s Disease with Semi-Supervised Competitive Learning Algorithm

This article proposes a novel semi-supervised competitive learning (SSCL) algorithm for vocal pattern classifications in Parkinson’s disease (PD). The acoustic parameters of voice records were grouped into the families of jitter, shimmer, harmonic-to-noise, frequency, and nonlinear measures, respectively. The linear correlations were computed within each acoustic parameter family. According to the correlation matrix results, the jitter, shimmer, and harmonic-to-noise parameters presented as highly correlated in terms of Pearson’s correlation coefficients. Then, the principal component analysis (PCA) technique was implemented to eliminate the redundant dimensions of the acoustic parameters for each family. The Mann–Whitney–Wilcoxon hypothesis test was used to evaluate the significant difference of the PCA-projected features between the healthy subjects and PD patients. Eight dominant PCA-projected features were selected based on the eigenvalue threshold criterion and the statistical significance level (p < 0.05) of the hypothesis test. The SSCL algorithm proposed in this paper included the procedures of the competitive prototype seed selection, K-means optimization, and the nearest neighbor classifications. The pattern classification experimental results showed that the proposed SSCL method can provide the excellent diagnostic performances in terms of accuracy (0.838), recall (0.825), specificity (0.85), precision (0.846), F-score (0.835), Matthews correlation coefficient (0.675), area under the receiver operating characteristic curve (0.939), and Kappa coefficient (0.675), which were consistently better than those results of conventional KNN or SVM classifiers.


Introduction
Parkinson's disease (PD) is a type of chronic neurological disorder that is progressively caused by the deterioration of neurotransmitter dopaminergic nerve cells in the substantia nigra of the brain [1]. In the early stages, PD patients occasionally suffer noticeable tremors in just one hand. Later on, PD patients gradually start experiencing some behavioral changes, with the primary symptoms including recurring tremors, stiffness, impaired balance, slow movement, freezing of gait, and dysphonia [2][3][4]. The dysfunction of body motor coordination in PD may also lead to several mental disturbances, with the secondary symptoms such as anxiety, depression, or fatigue. The development of muscle rigidity and articulatory hypokinesia in PD may cause some phonatory disorders such as dysphonia and hypokinetic dysarthria [5,6].
As a result of glottic insufficiency and vocal cord dysfunction in abduction or adduction, the altered acoustic amplitude and pitch frequency variations are commonly present in the speeches of PD patients [7,8]. In clinical practice, the changes of perceptual system and phonation impairment can be observed when PD patients pronounce sustained vowel sounds [5]. As an alternative clinical approach to medical imaging examination, the standard speech test is a low-cost and objective examination and monitoring solution, without any external radiation exposure from medical devices [9].
Voice records are often quantified in terms of various vocal parameters [10,11], which can be used as dominant biomarkers to detect the progressive changes of PD symptoms [12][13][14]. The acoustic signal amplitude (shimmer) parameters [15], pitch local perturbation (jitter) parameters [16], frequency cepstral parameters [17], vocal fold vibration periodicity (harmonics-to-noise ratio) parameters [18], and nonlinear dynamics measures [19], are the typical vocal parameters for quantifying the pathological conditions of dysphonia and dysarthria in PD [20,21]. The work of Rahn et al. [16] indicated that the jitter would significantly increase in PD patients. Little et al. [19] computed the correlation dimension and detrended fluctuation analysis (DFA) nonlinear features, and reported that the level of acoustic dynamics in the voice recordings of PD patients was much higher than that of healthy controls (HC). Cnockaert et al. [22] used the continuous wavelet transforms to study the fundamental frequency trace and low-frequency vocal modulation in sustained vowel voices, and they observed the significant phonatory disturbances in fundamental frequency of PD patients. Rodriguez-Perez et al. [23] reported that the mean, minimum, maximum, and standard deviation parameters of fundamental frequency modulation spectrum were significantly altered in PD. Viswanathan et al. [24] used the fractal dimension and normalized mutual information to quantify the complexity of acoustic signals in PD. They observed that the vocal fractal dimension of PD patients was significantly lower, and the normalized mutual information of sustained vowel voices in PD was significantly larger than in HC subjects [24]. They also compared the other nonlinear features such as normalized pitch period entropy and glottal closing quotient, and carried out the classification experiments using a support vector machine (SVM) based on a linear kernel. Their experimental results indicated that the fractal dimension and normalized mutual information were the dominant nonlinear features for detection of PD voices [24].
As some acoustic features are quantified using the similar signal processing tools, feature correlation analysis and selection procedures could be considered to reduce the feature dimensions and prevent the unnecessary computation costs [25,26]. Mohamadzadeh et al. [26] utilized a sparse representation technique to select the distinct features, and then categorized the vocal patterns of PD patients using an approximate message passing classifier. The vocal features may also help develop different effective computer-aided diagnostic tools that are capable of providing high sensitivity and specificity results for the detection of the symptomatic speech changes of PD patients [25,27].
Recently, advanced machine learning algorithms have been widely used to detection of vocal patterns for monitoring the PD progression and assessing the disease severity [20,[28][29][30][31]. How to design a sensitively and reliably machine learning system for phonation impairment detection is still an engineering challenge. Vaiciukynas et al. [32] used the random forest algorithm to analyze the abnormal sustained vowel voices of PD patients. They reported that the nonlinear projection of a proximity matrix into the two-dimensional feature space can provide excellent feature visualizations to support the medical decision-making. Berus et al. [29] studied the vocal feature correlations using different correlation coefficients, and implemented dimensionality reduction using principal component analysis (PCA) and self-organizing maps. They also developed a number of feedforward artificial neural networks with different parameter configurations, and achieved an accuracy of 0.8647 in the voice classification tasks. Hires et al. [33] developed a hybrid system that combined multiple convolutional neural networks (CNNs) to distinguish the vocal patterns of PD patients. In order to train each CNN, they applied a multiple-fine-tuning approach to reduce the semantical gap between the source and target tasks. They reported that such a CNN ensemble system did not require any feature extraction procedure, and can also achieve excellent diagnostic results [33]. Sheibani et al. [34] designed a classifier ensemble system to identify the pathological vocal patterns with frequency features. Their experiments demonstrated that the ensemble system may provide an accuracy of 0.906, which was better than a single classifier. Machine learning algorithms may assist in designing portable computer-aided systems for speech monitoring as well. Lauraitis et al. [35] developed a mobile application with Android operating system to record the body movement and speech data from patients with neurological disorders and HC subjects. Their experiments indicated that the finger tapping, energy expenditure, and vocal features were useful for different neural impairment monitoring.
In this paper, we propose a novel semi-supervised learning method based on competitive learning to detect the dysphonic voice recordings of PD patients. The acoustic parameters extracted from 240 voice records of PD patients and age-matched HC subjects were analyzed by grouping into the families of jitter, shimmer, harmonic-to-noise ratio, frequency cepstral parameters, and nonlinear measures. The Pearson correlation analysis and the PCA method were applied to remove the linear correlations between vocal parameters and reduce the feature dimensions, respectively. The dominant PCA-projected features were selected based on the Mann-Whitney-Wilcoxon hypothesis test (p-values < 0.05) for further pattern classification tasks. The semi-supervised competitive learning method for detection of vocal patterns in PD contains the procedures of the competitive prototype seed selection, K-means optimization, and the nearest neighbor classifiers. The effectiveness of the proposed machine learning method was evaluated in terms of confusion matrix and several prevailing diagnostic metrics.

Materials
The set of dysphonic voice data named "Parkinson Dataset with replicated acoustic features Data Set" was provided by Naranjo et al. [36], which is publicly available via University of California, Irvine (UCI) machine learning repository [37]. As reported by Naranjo et al. [36], a total of 80 subjects participated in the voice recording experiment. The number of HC subjects (n = 40) matched that of PD patients (n = 40). The HC subject group consisted of 22 (55%) healthy males and 18 (45%) healthy females, with the averaged age of 66.38 ± 8.38 years old. The age-matched PD group (mean ± SD: 69.58 ± 7.82 years old) was composed of 27 (67.5%) male and 13 (32.5%) female patients, respectively, who were recruited from the Regional Association for Parkinson's Disease in Extremadura, Spain. At least two of the primary symptoms of resting tremor at 4-6 Hz, muscle rigidity, postural instability, or bradykinesia were manifested in these PD patients. All these subjects provided their signed informed consent sheets, by complying with the protocol of the voice recording experiments [36]. The protocol documents were reviewed and approved by the Bioethical Committee from the University of Extremadura, Spain, as claimed in the previous study of Naranjo et al. [36]. The population statistics of HC subjects and PD patients are listed in Table 1. During the voice recording process, the subjects pronounced the sustained vowel sounds at their normal speed for 5 s. Each subject was requested to repeat the vowel sounds for three times, so that three separate voices were recorded from one subject. Thus, the dysphonic voice data set contains 240 voice records in total. According to Naranjo et al. [36], the voices were acquired by a microphone (AKG Model: 520) and recorded by a laptop with an external sound card (TASCAM Model: US322). The raw voice recordings were sampled at 44.1 kHz and digitalized with a resolution of 16 bits per sample [36].
The acoustic parameters derived from the voice recordings were grouped into the following five families: pitch local perturbation (jitter) measures, amplitude local per-turbation (shimmer) measures, harmonic-to-noise (HNR) features, nonlinear measures, Mel-frequency cepstral coefficient (MFCC), and frequency delta measures [17,36]. The nonlinear parameters were computed using the prevailing measures such as recurrence period density entropy (RPDE), DFA, pitch period entropy (PPE), and glottal-to-noise excitation ratio (GNE), respectively. Table 2 lists the abbreviations of the acoustic parameters and the detailed descriptions for the jitter, shimmer, HNR, nonlinear, and frequency MFCC and Delta families, respectively. Harmonic-to-noise ratio in 0-500 Hz HNR15 Harmonic-to-noise ratio in 0-1500 Hz Harmonic-to-noise HNR25 Harmonic-to-noise ratio in 0-2500 Hz HNR35 Harmonic-to-noise ratio in 0-3500 Hz HNR38 Harmonic-to-noise ratio in 0-3800 Hz

Methods
The present study developed the methodological procedures of feature computing with dimensionality reduction, pattern analysis based on semi-supervised learning, classification performance evaluation and result analysis [38]. The flowchart of the proposed method is shown in Figure 1. Highly correlated?

Yes No
Feature selection with Mann-Whitney-Wilcoxon hypothesis test

Selected vocal features
Pattern analysis with semisupervised competitive learning

Classification evaluation and result analysis
Feature computing and selection Not significant Discard features Significant difference Figure 1. Flowchart of the voice detection procedures that contain vocal parameter analysis, dimensionality reduction, feature selection using the Mann-Whitney-Wilcoxon hypothesis test, pattern analysis based on semi-supervised competitive learning, and classification result evaluation.

Feature Computing and Selection
According to the previous work of Naranjo et al. [17], the vocal parameters computed from the voice recordings of each subject in the data set would manifest high correlations between each other. In the present study, we grouped the vocal parameters into five families, i.e., the jitter, shimmer, harmonic-to-noise, nonlinear, and frequency parameter families. Then, we studied the linear correlations of the vocal parameters within each family by calculating the Pearson correlation coefficients as where cov(V a , V b ) is the covariance of a pair of vocal parameters V a and V b , and SD denotes the standard deviation.
Regarding those vocal parameter families that presented strong linear correlations (|ρ| > 0.75), it is necessary to decrease the correlated redundancy [39]. In the present study, we applied the PCA method to reduce the dimensions in feature space. The PCA is an orthogonal linear transformation that derives the eigenvalues and the corresponding eigenvectors from the covariance matrix of the data. The components can be projected based on the weight matrix in accordance with the eigenvalues sorted from largest to smallest. The dimension reduction procedure was implemented according to the 90% percentage threshold criterion, i.e., the first h principal components would be kept if the sum of their eigenvalues over the sum of total eigenvalues reached a certain percentage of 90% as: where Q denotes the total number of eigenvalues λ V q . The selection of the PCA-projected acoustic parameters, along with the remaining weakly correlated parameters, was performed using statistical analysis of the Mann-Whitney-Wilcoxon hypothesis test, which may estimate whether the statistic value from two populations was equal without a strict normal distribution assumption. The level of statistical significance was set as p-value < 0.05. Only the vocal parameters that presented the significant difference would be selected to construct the vector of input features for further pattern analysis.

Competitive Selection of Initial Prototype Seeds
Given a L-dimensional data set of size N, expressed as the set X = {x 1 , x 2 , · · · , x N }, let us define the L2 norm to measure the Euclidean distance between a pair of data patterns in the L feature dimensions as The indicator function of I(x m , x n , η) is defined to represent the proximity degree between two data patterns as follows If the Euclidean distance between x m and x n is not larger than an assigned range η, the indicator function equals 1, otherwise the indicator value remains zero. Then, the density of x m can be calculated as the sum of its indicator values with regards to all of the patterns in the data set, i.e., A candidate seed, x m , is selected according to the following indicator function: where γ (1 < γ ≤ N) is the density threshold parameter. The set of all available candidate seeds, C (C ⊆ X), is composed of the data patterns whose densities reach over the threshold as Let us define A(x m ) to be the set of all adjacent candidate seeds that locate in the range η of the candidate seed C(x m ), i.e., It is worth noting that the candidate seed x m itself also belongs to the adjacent set A(x m ), and A(x m ) is a subset of the entire data set X, i.e., x m ∈ A(x m ) and A(x m ) ⊂ X.

K-Means Optimization Algorithm
The qualified prototype seeds can be selected during the competitive learning process based on the K-means optimization algorithm. For a data pattern x n that belongs to the prototype k, its prototype indicator parameter can be written as The effectiveness and sensitivity of the K-means learning algorithm mainly depend on the choice of the number of prototypes K and the initialization of the centroid seeds, respectively. Let S (S ⊂ X) denote the set of winning prototype seeds. In the K-means procedure, the number of total prototypes is determined by the size of the set of prototype seeds S, instead of arbitrary assignment. Additionally, the members of S obtained by the previous competitive selection process are used as the initial centroids of the prototypes for K-means optimization, i.e., {s 0 k } = S * , because the data patterns with the relatively higher densities are most likely close to the true prototype centroids.
The K-means learning algorithm iteratively searches for the optimal partitions of data patterns by following the minimum sum of squared error criterion, and then updates the prototype centroids. In the ith K-means optimization iteration, the sum of squared errors between the data patterns x n and the prototype centroids s i k is optimized with respect to the indicators θ i kn as Then, the prototype centroids s i+1 k are updated for the i + 1th iteration as Such an optimization iteration and the corresponding centroid update are repeated until the location for each prototype centroid no longer changes.

Nearest Neighbor Classification
In the present study, we utilized the concept of nearest neighbor clustering to accomplish the pattern analysis tasks. The nearest neighbor algorithm aims to categorize the data patterns that belong to the prototype k into the target class labels based on the P-size training set, P = {x p , ω p } P p=1 . Such a training set contains the available data patterns x p with known labels ω p for the training purpose. In the case that the amount of training data (the patterns with known class labels) are less that the total number of testing data (the prototype patterns to be distinguished), i.e., P < N, we may use each prototype centroid as a representative of its own prototype patterns.
Let {x kp , ω kp } R p=1 denote the R (R ≤ P) nearest neighbors of a query prototype centroid s k , in terms of the first R smallest Euclidean distance between the training patterns and the prototype centroid s k , i.e., Letω = {ω t } T t=1 represent the set of target class labels, where T (T ≤ P) is the total number of target classes. We may define the class indicator function δ ω kp , ω t to indicate the circumstance if ω kp matches ω t as Then, the class label ω s k of the query prototype centroid s k is predicted by the majority vote of the labels of R nearest neighbors as Finally, all of the data patterns that belong to the prototype k are assigned with the same class label as the centroid s k as the overall classification results of our semi-supervised competitive learning method, i.e.,

Benchmark Classifiers for Comparison
In order to compare the classification results, we also applied the benchmark classifiers, in particular K-nearest neighbor (KNN) and SVM, on the same dysphonic voice data set. The parameter K = 7 was selected as the number of nearest neighbor patterns with known class labels, with the aim to best predict the classes of the testing patterns by means of majority voting.
The SVM is a typical feedforward neural network that consists of a single nonlinear hidden layer. Unlike the multilayer perceptron trained by the well-known back-propagation algorithm, the SVM operates only in a batch mode and follows the principle of structural risk minimization, which roots in the Vapnik-Chervonenkis dimension theory [40]. The SVM training is achieved by optimizing the margin hyperplane that separates the training data samples with several slack variables (also referred to as support vectors). For the nonlinear classification problems, the data samples are commonly projected into a highdimensional space by the inner-product kernel function such that the pattern classification can be converted into a linear separable problem.
In the present study, the SVM kernel was constructed with a radial basis function. The optimal spread parameter of the radial basis function was selected as σ = 4, so as to provide the best classification accuracy. The training and testing of the KNN, SVM, and SSCL methods were implemented by following the 10-fold cross-validation evaluation approach. In addition, an expert system based on Bayesian inference [36] and a two-stage variable selection and classification method [17] were used for a comparison of previous related studies on the same data set.

Classification Performance Evaluation Metrics
In order to evaluate the voice detection and classification results, we considered the prevailing pattern analysis metrics such as accuracy, recall, specificity, precision, F-score, the Matthews correlation coefficient, the area under the receiver operating characteristic (ROC) curve [4], and Cohen's Kappa coefficient.

Accuracy
The confusion matrix was calculated with the ratios of true positive (TP), true negative (TN), false positive (FP, also referred to as Type I error), and false negative (FN, referred to as Type II error). The overall accuracy of an entire data classification can be written as

Recall
The recall, also referred to as sensitivity, describes the true positive rate of a given classification model, the definition of which is formulated as

Specificity
The specificity presents the true negative rate of a classification model, which is given by

Precision
The precision is also referred to as positive predictive rate, i.e., the ratio of correct positive patterns to the total predicted positive patterns, defined by

F-Score
Because the numbers of HC and PD subjects and their voice records were matched in the UCI dysphonic voice data set, the F-score can be calculated as the harmonic mean of precision and recall based on the confusion matrix [41], derived as

Matthews Correlation Coefficient
The Matthews correlation coefficient (MCC) [42] was commonly used as a classification quality indicator for performance evaluation, which can be calculated as The typical random guess leads to a zero MCC value (MCC = 0). An effective binary classifier should provide a positive MCC value closer to perfect diagnosis (MCC = 1). The MCC is an important indicator because it considers all the true and false positives and negatives with the representation in the form of normalized correlation coefficient.

Area under ROC Curve
The ROC curve is a graphical plot of diagnostic trade-off between the clinical recall/sensitivity and specificity for every possible cut-off for a combination of binary classification tests. The area under the ROC curve (AUC) estimates the area underneath the entire ROC curve, which is commonly used for the evaluation of the diagnostic performance in many biomedical applications. Typically, a classifier that generates a ROC curve closer to the top-left corner in the plot and with a larger AUC indicates an excellent diagnostic performance.

Kappa Coefficient
In medical research, Cohen's Kappa coefficient is a benchmark metric that evaluates the level of agreement between the predicted classes and the truth classes [43]. The Kappa coefficient attempts to measures the ratio of the observed agreement over the expected agreement purely by chance.
The calculation of Kappa coefficient can be expressed as [43] The term p o is the observed agreement computed for the total number n of voice records as The term p e is the hypothetical probability of expected agreement due to chance, defined as The Kappa coefficient typically varies in the range from 0 and 1. A zero Kappa coefficient (Kappa = 0) indicates no agreement between the predicted classes and the truth classes, which can be interpreted to mean that the predictions made by a classifier are completely incorrect over the entire data set. A Kappa coefficient between 0.6 and 0.8 commonly implies a substantial agreement, and the Kappa coefficient over 0.8 toward 1 implies a near perfect agreement. In the present study, we computed the Kappa coefficient for each classification method to assess the classification performance. Figure 2 provides the color maps of the Pearson correlation coefficients computed from each pair of vocal parameters for the families of jitter, shimmer, HNR, nonlinear, MFCC, and frequency delta, respectively. It is worth noting that the jitter, shimmer, and HNR vocal parameters were highly correlated (ρ ≥ 0.92) between each other in each family. With regard to the MFCC and Delta families, the vocal parameters showed somewhat positive correlated but not strong enough (0 < ρ < 0.9). In contrast, the nonlinear vocal parameters of RPDE, DFA, PPE, and GNE only presented slight linear correlations with each other (−0.6 < ρ < 0.5). The similar acoustic signal preprocessing procedure for each family (listed in Table 2) was the major reason that caused the highly linear correlations of vocal parameters. On the other hand, the RPDE, DFA, PPE, and GNE parameters were computed by different nonlinear algorithms, so that these vocal parameters produced little correlated outcomes in the present study. In consequence, the RPDE, DFA, PPE, and GNE parameters were retained in the nonlinear parameter family, and the resting parameter families were projected by the PCA method into the principal component vectors in the lower dimensional feature space.

Feature Analysis Results
Because the jitter, shimmer, HNR, and frequency parameters manifested strong linear correlations as depicted in Figure 2, it is necessary to reduce the similarity effect of vocal parameters. By following the PCA dimension reduction criterion as described in Section 3.1, we only retained the primary principal components of the jitter, shimmer, and HNR parameter families for each, and annotated them as jitter-PCA, shimmer-PCA, and HNR-PCA, respectively. Regarding the frequency MFCC and Delta parameters, the first six and the first five principal components were chosen and annotated as Frequency-MFCC-PCA1 to 6 and Frequency-Delta-PCA1 to 5, respectively, as listed in Table 3. The PCA-projected features does not imply that they are separable between two classes in statistical sense. Table 3 provides the statistical hypothesis test results and the pvalues of the PCA-projected vocal features. The Mann-Whitney-Wilcoxon hypothesis test results indicated that the jitter-PCA, shimmer-PCA, and HNR-PCA consistently showed the significant difference (p-value < 0.05) between the HC and PD subject groups. Only the PPE and GNE nonlinear parameters of PD group were significantly different from those of HC group. However, the RPDE and DFA parameters of PD patients still maintained some degree of overlapping with those of HC subjects. Moreover, it is worth noting that the MFCC-PCA1, Delta-PCA1, and Delta-PCA5 parameters of the PD group were significantly different from those of the HC group, whereas the resting frequency parameters of MFCC and Delta families just showed slight differences that did not reach the statistical significance level of p-value < 0.05. Therefore, we selected the jitter-PCA, shimmer-PCA, HNR-PCA, PPE, GNE, MFCC-PCA1, Delta-PCA1, and Delta-PCA5 as the dominant features for pattern analysis based on the machine learning algorithms. Table 3. Mann-Whitney-Wilcoxon hypothesis test results of the vocal features derived from the PCA approach. The p-value < 0.05 indicates the significant difference, marked with * . Null hypothesis: Data samples from two subject groups are not significantly different in statistical sense; 1: rejects the null hypothesis, with the corresponding p-value marked with stars, 0: accepts the null hypothesis.

Vocal Features Null Hypothesis p-Value
Jitter

Classification Results and Discussions
Based on the selected eight dominant features, the pattern classification tasks were accomplished by the proposed SSCL method and the benchmark KNN and SVM classifiers. Table 4 demonstrated the classification results provided by different pattern analysis methods, along with the results reported in the previous studies [17,36] for comparison purpose. The best classification results were marked in bold numerical values in the table. It is worth noting that most classification results such as accuracy, recall, specific, and precision reported by Narajo et al. [17,36] just ranged from 0.75 to 0.8. The results of the two-stage method were slightly better than those obtained with the Bayesian expert system. In the present study, the KNN (K = 7), the SVM with radial basis function kernels, and the proposed SSCL method consistently outperformed the expert system with Bayesian inference, on the same data set. All of the KNN, SVM, and SSCL method provided higher values of accuracy, recall/sensitivity, and specificity, precision, and MCC, compared with the two-stage method used in [17]. In particular, the SVM ameliorated the classification performance with improvements of 0.046, 0.035, 0.058, 0.036 for accuracy, recall, specificity, and precision, respectively, versus the results of two-stage method. The increments of accuracy, recall, precision provided by the proposed SSCL method over the SVM classifier were 0.013, 0.025, and 0.004, respectively. Although the precision and AUC values of the KNN were a bit lower than those of the two-stage method, the KNN still provided a recall of 0.812 as true positive rate, which was much better than that of the Narajo's two-stage method (recall: 0.765). Such classification results obtained with the benchmark KNN and SVM classifiers, along with the proposed SSCL method, demonstrated the merits of the correlation analysis and vocal feature dimension reduction procedures. In particular, the proposed SSCL method achieved the best in four confusion matrix metrics (accuracy: 0.838, recall: 0.825, specificity: 0.85, precision: 0.846), compared with the other benchmark classifiers or the latest research works [17,36]. It can be observed in Table 4 that the Kappa coefficients of the KNN, SVM, and SSCL methods were over 0.6, which indicated that three classifiers can provide the substantial agreements between the predicted classes and the truth classes. The SSCL method also provided the largest Kappa coefficient value (Kappa = 0.675), in comparison with either of the KNN or SVM classifier. It is also worth noting that the SD values of all classification metrics provided by the proposed SSCL method were consistently smaller than the other classifiers. Such results demonstrated that the SSCL method can effectively distinguish the overall vocal patterns between the HC and PD subject groups. According to Figure 3, it can be observed that the ROC curve provided by the proposed SSCL method was consistently superior to the other two benchmark classifiers, especially when the false positive rate (i.e., 1−specificity) ranged from 0.1 to 1.0. Moreover, the proposed SSCL method improved by an AUC increase of 0.06 over the two-stage method proposed by Narajo et al. [17].  Our experimental results were also comparable to the other relevant research works in the literature. Viswanathan et al. [24] reported that the linear-kernel SVM input with the combination of fractal dimension and features extracted from three sustained voices provided the best classification accuracy of 0.81 and AUC value of 0.84, respectively. In our experiments, both of the SVM with radial basis function kernels and the proposed SSCL methods produced better overall accuracy results than the work of Viswanathan et al. [24]. Furthermore, the SVM and SSCL methods have greatly increased the AUC values up to 0.868 and 0.939, respectively.
Berus et al. [29] implemented the similar feature dimension reduction based on PCA and self-organizing maps, and used a group of artificial neural networks to distinguish vocal patterns between 20 HC subjects and 20 PD patients. Our proposed SSCL method provided a classification accuracy of 0.838, which is comparable to the accuracy result of 0.8647 reported by Berus et al. [29].
In addition, we studied the classification results of the KNN, SVM, and SSCL methods by considering the gender factor as well. Table 5 lists the misclassified voice records with regard to gender provided by the KNN, SVM, and SSCL methods, respectively. The misclassified voice recordings for males and females, as well as for HC and PD subject groups, were computed in percentage over the total incorrect voice recordings for each classifier. It is worth noting that all of the three classifiers tended to be good at distinguish vocal patterns in female, especially for PD patients. The misclassification ratios of males and females produced by both of the KNN and SVM were comparable for HC subjects. On the other hand, for PD patients, the misclassification ratios of females were much smaller than those of males. Such results were confirmed by the observation of Rusz et al. [44] that the speech abnormalities of females were generally better distinguished in relation to those of males.

Conclusions
The quality of phonation and daily communications of PD patients could be affected by the dysphonia, which can be detected based on the vocal parameters and machine learning algorithms. In the present study, the vocal patterns between HC subjects and PD patients have been effectively categorized using the selected PCA-projected features and the semi-supervised learning algorithm. The Pearson correlation coefficients were calculated between acoustic parameter pairs for the families of jitter, shimmer, HNR, nonlinear, and frequency MFCC and Delta, respectively. The experimental results showed strong linear correlations of jitter, shimmer, HNR parameters with each other in their own families. Validated by the Mann-Whitney-Wilcoxon hypothesis test, the PCA-projected features that presented significantly different in PD were selected for the purpose of dimension reduction. The semi-supervised machine learning method that incorporated the procedures of competitive prototype selection, K-means optimization, and the nearest neighbor classification was proposed for the pattern classification task. According to our classification experiments, the semi-supervised learning method was able to provide higher accuracy, recall, specificity, F-score, and Matthews correlation coefficient, and area under ROC curve, which was superior to the prevailing KNN and SVM classifiers. The proposed SSCL method also demonstrated better diagnostic performances than the results reported by Naranjo's previous related studies.
It is believed that the feature selection and SSCL methods presented in this paper can also be used for classification of vocal patterns in other diseases. Because several vocal parameters are measured with similar settings, the feature selection method is able to reduce the correlated dimensions and avoid unnecessary computational costs. Then, the SSCL method can make effective classifications based on the dominant PCA-projected features. Validations of the feature selection and SSCL methods for analysis of the voice characteristics related to other diseases could be involved in the future work.
The future work could also focus on the nonlinear measure of voice disturbances associated with PD severity stages, and the advanced deep learning paradigms for the design of sensitively and reliably phonation impairment detection and monitoring systems.