Parkinson’s Disease Diagnosis Using Laplacian Score, Gaussian Process Regression and Self-Organizing Maps

Parkinson’s disease (PD) is a complex degenerative brain disease that affects nerve cells in the brain responsible for body movement. Machine learning is widely used to track the progression of PD in its early stages by predicting unified Parkinson’s disease rating scale (UPDRS) scores. In this paper, we aim to develop a new method for PD diagnosis with the aid of supervised and unsupervised learning techniques. Our method is developed using the Laplacian score, Gaussian process regression (GPR) and self-organizing maps (SOM). SOM is used to segment the data to handle large PD datasets. The models are then constructed using GPR for the prediction of the UPDRS scores. To select the important features in the PD dataset, we use the Laplacian score in the method. We evaluate the developed approach on a PD dataset including a set of speech signals. The method was evaluated through root-mean-square error (RMSE) and adjusted R-squared (adjusted R²). Our findings reveal that the proposed method is efficient in the prediction of UPDRS scores through a set of speech signals (dysphonia measures). The method evaluation showed that SOM combined with the Laplacian score and Gaussian process regression with the exponential kernel provides the best results for R-squared (Motor-UPDRS = 0.9489; Total-UPDRS = 0.9516) and RMSE (Motor-UPDRS = 0.5144; Total-UPDRS = 0.5105) in predicting UPDRS compared with the other kernels in Gaussian process regression.


Introduction
Parkinson's disease (PD) is a complex degenerative brain disease with increasing motor symptoms that can significantly impair patients' quality of life [1,2]. Aging has been linked to a number of negative health consequences, including those affecting the nervous system [3]. The number of people affected by these conditions is expected to increase as the global population ages. The most significant risk factor for developing PD appears to be age. The disease is typically diagnosed in people over the age of 60 [4][5][6], but it can affect younger people as well; 20% of patients are diagnosed with PD before the age of 50. PD affects 6.3 million people worldwide [7], and the disease's impact on quality of life and life expectancy, as well as social and monetary costs, are expected to grow as the population ages. According to the statistic, there will be 8.7 million PD patients by 2030 [8]. Furthermore, the statistic shows that the number of PD patients in the US is predicted to increase to around 1.8 million by 2030 [9]. There is no one specific test that can diagnose PD. Instead, a neurologist will examine a patient's symptoms and medical history and perform a neurological examination in order to make a diagnosis.
Because of the high heterogeneity of PD, each individual may experience a variety of symptoms. Since the initial symptoms are mild, they can go undetected for long periods. Furthermore, at the diagnostic level, 60% of PD patients have a clear asymmetry of symptoms. There are numerous reported PD symptoms, both motor and nonmotor [10,11]. Constipation, sleep disorders, rapid eye movement (REM) sleep behavior disorders, bladder disorders (urinary incontinence) and anxiety are some examples of nonmotor symptoms. Note that non-motor symptoms can sometimes precede motor symptoms, and are thought to represent the disease's early stage. The secondary symptoms can be freezing of gait, gait dysfunction, hallucination, smell dysfunction, thinking difficulties, dementia, sexual dysfunction and depression. Although there is no cure for PD, there are treatments available to help manage its symptoms. The goal of treatment is to either replace the dopamine that is missing in the brain or to correct the problems that are caused by the lack of dopamine. Patients may be unaware of this disorder's most common symptom, which is reduced vocal loudness. In addition, people with Parkinson's disease commonly suffer from dysphonia [12], which is vocal impairment and characterized by a breathy voice and harshness. As the disease progresses, patients may experience greater difficulty speaking.
The UPDRS, or unified Parkinson's disease rating scale, which measures the severity and presence of symptoms of PD, is the most popular tool used by clinicians to measure PD symptom severity (but does not measure their underlying causes) [3,13,14]. The UPDRS scale consists of three sections that assess motor symptoms, activities of daily life and mentation, behavior and mood. Monitoring the progression of PD is essential for better patient-directed care [3,13,15,16]. A convincing method for accurately and effectively tracking the progression of PD at more frequent intervals with less expense and resource waste is remote monitoring. A growing option in general medical care is noninvasive telemonitoring, which may allow for reliable, affordable PD screening while potentially reducing the need for frequent clinic visits. As a result, the clinical evaluation of the subject's condition is evaluated more accurately and the burden on national healthcare systems is reduced.
Machine learning has demonstrated to be effective in disease diagnosis [15][16][17][18][19][20][21][22][23][24][25][26][27]. There have been many methods for PD diagnosis; some of them are presented in Table 1. The findings for the methods presented in this table show that there is no research on the use of clustering, feature selection and prediction machine learning for the prediction of UPDRS. As seen from this table, the previous research was mainly developed using prediction learning techniques. The use of clustering techniques can be effective in developing a robust learning method for UPDRS prediction. Clustering is effective because it allows the PD diagnosis methods to identify groups of similar objects or data points in the PD dataset. By grouping similar data points, the underlying patterns and structures within the data can be better understood to make more informed decisions. Accordingly, this study aims to develop a new method using clustering, feature selection and prediction machine learning to predict UPDRS scores (Total-UPDRS and Motor-UPDRS) and simulate the relationship between the characteristics of speech signals (dysphonia measures) and UPDRS scores. In this research, Motor-UPDRS is the motor section of the UPDRS. In addition, Total-UPDRS is the full range of UPDRS as described in [13]. Our method is developed using the Laplacian score, Gaussian process regression (GPR), and self-organizing maps (SOM) techniques. The SOM technique is used to segment the data to handle large PD datasets. The models are then constructed using the GPR technique for UPDRS prediction. To select the important features in the PD dataset, we use the Laplacian score in the method. We perform several experiments on a PD dataset in the UCI machine learning archive, including a set of speech signals (dysphonia measures), to evaluate the developed method.  [15,31,51,[71][72][73][74][75][76][77][78][79] The remainder of this paper is organized as follows. In Section 2, the techniques incorporated in the proposed method are presented. Data analysis and method evaluations are performed in Section 3. In Section 4, the discussion section is presented. Finally, this work is concluded in Section 5.

Method
This study developed a hybrid method using unsupervised, feature selection and supervised learning techniques. The steps of the proposed method are shown in Figure 1. The data were collected from the UCI machine learning archive. In the first step of our methodology, data were clustered using the SOM clustering technique. We then used the Laplacian score for feature selection. To perform UPDRS prediction, GPR was implemented on the generated clusters. The proposed method was evaluated using root-mean-square error (RMSE) and correlation coefficients. In this section, the techniques incorporated in the proposed method are introduced.   [15,31,51,[71][72][73][74][75][76][77][78][79] The remainder of this paper is organized as follows. In Section 2, the techniques incorporated in the proposed method are presented. Data analysis and method evaluations are performed in Section 3. In Section 4, the discussion section is presented. Finally, this work is concluded in Section 5.

Method
This study developed a hybrid method using unsupervised, feature selection and supervised learning techniques. The steps of the proposed method are shown in Figure 1. The data were collected from the UCI machine learning archive. In the first step of our methodology, data were clustered using the SOM clustering technique. We then used the Laplacian score for feature selection. To perform UPDRS prediction, GPR was implemented on the generated clusters. The proposed method was evaluated using root-mean-square error (RMSE) and correlation coefficients. In this section, the techniques incorporated in the proposed method are introduced.

Gaussian Process Regression
The Gaussian process regression is a stochastic process that can be interpreted as probability distributions over functions with a number of random variables [80,81]. A joint Gaussian distribution exists for any finite range of these random variables. The Gaussian process regression is a machine learning approach that can be employed to deal with complex problems (e.g., nonlinear problems) [82]. It is developed on the basis of

Gaussian Process Regression
The Gaussian process regression is a stochastic process that can be interpreted as probability distributions over functions with a number of random variables [80,81]. A joint Gaussian distribution exists for any finite range of these random variables. The Gaussian process regression is a machine learning approach that can be employed to deal with complex problems (e.g., nonlinear problems) [82]. It is developed on the basis of statistical theory and Bayesian theory [83]. This technique is widely used for regression problems [83,84].
A training dataset is required to establish a relationship between the input and output variables of the dataset. Assume that there is a dataset D with d-dimensional input vector x i ∈ R d (d ≥ 1) and with y i as the corresponding output. Then, we have: Thus, the output vector y = {y i } i=1:p and matrix X = {x i } i=1:p are organized, respectively, for y i values and x i vector. Gaussian process regression employs a Gaussian prior which is parameterized through a covariance function k(x, x ) and the mean function m(x) to model a time series.
where m(x) is typically taken to be zero without affecting generality, and f (x) is known as the latent variable in the Gaussian process regression model. The similarity between input data points, which is a crucial component of the Gaussian process regression model, is described by the covariance function k x, x f . Different kernel functions are used in Gaussian process regression. A common kernel function used in Gaussian process regression is squared exponential (SE), which is represented as: where ψ 1 and ψ 2 indicate two hyperparameters that govern the accuracy of the output prediction. They need to be optimized in Gaussian process regression. During the training phase of Gaussian process regression, the log-likelihood function in the following equation is maximized for the estimation of the kernel matrix's parameters K: where in the above equation, σ 2 p indicates the variance of the noise. In this function, p indicates the number of test data points.
Because the log-likelihood function is convex, the gradient descent algorithm can solve Equation (4). After training the model, at test points X * , the posterior distribution of f * = f (X * ) will be obtained as: In the above formula, cov f indicates the prediction variance and f denotes the prediction mean.

SOM
As an unsupervised learning algorithm, SOM [85] is used to cluster and visualize high-dimensional datasets simultaneously [86][87][88][89][90][91]. The self-organization process used by the learning algorithm was biologically inspired by the cortex brain cells. In contrast to the error-correction learning used in feedforward neural networks, this type of learning is referred to as competitive learning. A map is a grid-organized neural network made up of interconnected nodes, also known as cells, neurons or units. For visual purposes, the grid topology is typically two-dimensional [92,93], but it can have any topology. A prototype vector from the high-dimensional input space where the data live is assigned to each cell. The prototypes are updated to fit the training set during an iterative learning process; when a prototype is updated, the prototypes associated with neighboring cells are also updated using a specific weight. As the distance between grid cells increases, the weights decrease and the cells on the map near each other are linked to prototype vectors in the input space near each other. This allows the map to preserve the topology of the space. The resulting map, after convergence, allows for efficient visualization of the high-dimensional input space on a low-dimensional map. Because of its ease of use and interpretable results, SOM is a popular clustering and visualization tool. The SOM procedure for clustering is presented in Algorithm 1. In this algorithm, input patterns X = → x 1 , . . . , → x N are considered for the data clustering in SOM. Number of iterations t max , learning rate ε(t) and neighborhood function σ(t) must be initialized in SOM to perform the data clustering. Note that each neuron w i represents an arbitrary number of input patterns. The output of SOM is a trained map and clustered input patterns. In Algorithm 1, the learning rate and radius of the neighborhood must both decrease at a constant rate for the algorithm to converge.

Laplacian Score for Feature Selection
The Laplacian score is based on Laplacian eigenmaps [94] and is considered a graphbased feature selection method. The Laplacian score models the data's local geometrical structure [95,96] with a k-nearest neighbor (k-NN) graph. Consider a dataset X = x 1 , . . . , x N ; to approximate the dataset's manifold structure, a k-NN graph is constructed, which contains an edge with weight W ij between x i and x j if x i is one of x j 's k-nearest neighbors, or conversely. There are several similarity-based methods for determining edge weights. The Euclidean distance is one of the popular similarity metrics to measure the distance between two vectors [97,98]. Thus, for x i and x j and with τ as a suitable constant, we can define the weight matrix W as follows: , if x i and x j are neighbors 0, otherwise.
Two data points can only be considered close to one another on a feature if and only if there is an edge connecting them. To select a good feature, the following objective function needs to be minimized: where f ri indicates the ith sample of the rth feature in the dataset and f r = ( f r1 , . . . , f rN ) T , Var( f r ) denotes the estimated variance of f r . In order to maximize representative power, larger variance features are preferred. Accordingly, we can obtain the variance of weight using the following equation.
where D ii is a diagonal matrix, and the corresponding degree matrix of W and 1 is a nonzero constant vector. Accordingly, the mean of each feature f r by Equation (8) is removed. This is carried out to avoid assigning a zero Laplacian score to a nonzero constant vector, such as 1, because such a feature obviously contains no information. For a good feature, we have: where a bigger W ij indicates a smaller f ri − f rj , L is the Laplacian matrix and L = D − W. Accordingly, the rth feature's Laplacian score is reduced to

Data Analysis and Method Evaluation
In this research, we used Parkinson's telemonitoring dataset [13] to evaluate the proposed method. Table 2 Table 2. A full description of these features is presented in [13]. The dataset was clustered by SOM for different topology maps. The SOM clustering quality was assessed by the Silhouette index. To measure the separation between the resulting clusters, this index can be used. The silhouette index of the object x i is defined as: where a(x i ) denotes the distance of x i to its own cluster, which is characterized as the average distance between the object x i and all the other objects in its own cluster h as: where n h denotes the number of data points in the cluster h, d E (i, j) denote the squared Euclidean distance, w jh indicates the indicator function (w jh = 1, The minimum average distance between the object x i and every other object in a cluster, excluding the cluster to which the object x i belongs, is defined by b(i). b(i) is calculated by: Accordingly, SI (x i ) ∈ [−1, 1]. When SI (x i ) is close to 1, the element x i is assigned to the correct cluster. When this value is close to −1, the object x i is in the incorrect cluster because the neighboring cluster is a better option than the selected cluster. The validity of the entire clustering can then be evaluated using the silhouette index, which is defined as: Accordingly, we present the results for the silhouette index in Figure 2. As seen from this figure, nine clusters in SOM provide the best silhouette index, as the highest value for SI is obtained for nine clusters. Hence, we clustered the PD data in nine clusters, as presented in Figure 3. The clusters in this figure are visualized using different PCs (principal components), which are PC 1, PC 9 and PC 16. In addition, in this figure, Total-UPDRS and Motor-UPDRS are visualized using PC 2 in nine clusters of SOM. We also provide the cluster centroids in Table 3. In this table, nine clusters are presented along with the centroid for each feature of the PD dataset.     To perform UPDRS prediction in each cluster of SOM, we first used the Laplacian score technique for feature selection. The results of the feature selection are presented in Figure 4 and Table A1 in Appendix A. For these results, the features are ranked for unsupervised learning using Laplacian scores. According to the results, a large score value indicates that the corresponding PD feature is important.  To perform UPDRS prediction in each cluster of SOM, we first used the Laplacian score technique for feature selection. The results of the feature selection are presented in Figure 4 and Table A1 in Appendix A. For these results, the features are ranked for unsupervised learning using Laplacian scores. According to the results, a large score value indicates that the corresponding PD feature is important.
The selected features of nine clusters of Parkinson's telemonitoring dataset were used in Gaussian process regression for UPDRS score (Total-UPDRS and Motor-UPDRS) predictions. In this study, the 10 most important features were selected for UPDRS prediction in each cluster. As a result, there were nine clusters, each of which included ten important features for UPDRS prediction. The selected features of nine clusters of Parkinson's telemonitoring dataset were used in Gaussian process regression for UPDRS score (Total-UPDRS and Motor-UPDRS) predictions. In this study, the 10 most important features were selected for UPDRS prediction in each cluster. As a result, there were nine clusters, each of which included ten important features for UPDRS prediction.
The method was run on Microsoft Windows 10 Pro and a laptop with Processor Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2592 Mhz, four core(s) and eight logical processor(s). A 5-fold cross-validation approach in the hyperparameter optimization to avoid overfitting was used in training the models in Gaussian process regression. For  The method was run on Microsoft Windows 10 Pro and a laptop with Processor Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2592 Mhz, four core(s) and eight logical processor(s). A 5-fold cross-validation approach in the hyperparameter optimization to avoid overfitting was used in training the models in Gaussian process regression. For example, to combine RMSE and five-fold cross-validation, we applied the following steps:

•
Dividing the data into five equal parts; • Training the model on four parts of the data and testing it on the fifth part, calculating the RMSE for that fold; • Repeating step 2 for all five folds; • Calculating the average RMSE across all five folds. This provided an estimate of the model's overall performance.
The nine models were assessed using RMSE and correlation coefficients. The highest value of adjusted R-squared (adjusted R 2 or the coefficient of determination) means perfection. Lower values of RMSE reflect better performance by the predictor. The RMSE is presented in Equation (15).
This metric is defined for a testing vector of length N, actual UPDRS and forecasted UPDRS. In this study, four kernels were used in Gaussian process regression: rational quadratic kernel, squared-exponential kernel, exponential kernel and Matérn 5/2 kernel. In Tables 4 and 5, we present the average results of R-squared and RMSE for Total-UPDRS and Motor-UPDRS, respectively, in Parkinson's disease. The results are provided for maximum, minimum and mean values of R-squared and RMSE. The results presented in Tables 4 and 5 clearly show that SOM + Laplacian score + Gaussian process regression (exponential kernel) provide the best results for R-squared and RMSE in predicting Total-UPDRS and Motor-UPDRS compared with the SOM + Laplacian score + Gaussian process regression (squared-exponential kernel), SOM + Laplacian score + Gaussian process regression (rational quadratic kernel) and SOM + Laplacian score + Gaussian process regression (Matérn 5/2 kernel). Furthermore, the findings reveal that SOM + Laplacian score + Gaussian process regression (Matérn 5/2 kernel) provided the largest prediction errors for Total-UPDRS and Motor-UPDRS.

Discussion
Machine learning has significant implications for PD. Researchers and healthcare providers can gain deeper insights into the disease by leveraging machine learning algorithms, allowing for earlier diagnosis, personalized treatment plans and improved symptom management. Early detection of PD is critical because early intervention can help slow disease progression and improve patient outcomes. By analyzing patient data and identifying specific patterns associated with the disease, machine learning can aid in the early detection of PD. Machine learning can also be used to create personalized treatment plans for patients with PD, taking into account individual patient data such as medical history, genetic information and response to previous treatments. This can assist healthcare providers in tailoring treatments to each patient's specific needs, improving treatment outcomes and quality of life. Furthermore, machine learning can help manage PD's symptoms, particularly through remote monitoring. Wearable devices with machine learning algorithms can monitor changes in motor symptoms and alert healthcare providers if necessary, enabling more proactive and responsive care. Overall, the implications of machine learning for PD are promising, opening up new avenues for disease diagnosis, treatment and management that can improve patient outcomes and quality of life.

Conclusions
The use of voice measurements has been an effective way for remote tracking of UPDRS. It eases the clinical monitoring of patients and increases the chances of early diagnosis of PD. Machine learning has been widely used in the analysis of speech signals in the diagnosis of PD. Accordingly, there have been many attempts in improving the accuracy of machine learning methods in this context. This study relied on feature selection, clustering and prediction learning techniques in improving the accuracy of PD diagnosis systems. We used the Laplacian score technique as a feature selection technique, SOM as a clustering technique based on the neural network approach, and Gaussian process regression as a prediction learning technique in the development of a new method for UPDRS prediction. SOM discovered nine clusters from the PD dataset. In each cluster of SOM, the most important features were selected by the Laplacian score technique for UPDRS precision by Gaussian process regression. Gaussian process regression was applied using different kernels, namely the rational quadratic kernel, squared-exponential kernel, exponential kernel, and Matérn 5/2 kernel. The method was evaluated through RMSE and adjusted R-squared. The results revealed that SOM + Laplacian score + Gaussian process regression (exponential kernel) provide the best results for R-squared and RMSE in predicting Total-UPDRS and Motor-UPDRS compared with the SOM + Laplacian score + Gaussian process regression (squared-exponential kernel), SOM + Laplacian score + Gaussian process regression (rational quadratic kernel) and SOM + Laplacian score + Gaussian process regression (Matérn 5/2 kernel). Although the proposed method has accurately predicted the UPDRS through a set of selected features by Laplacian score, this method can be further improved by optimizable Gaussian process regression. In addition, the use of incremental Gaussian process regression is greatly suggested in the development of the proposed method for online learning of PD data. The incremental use of Gaussian process regression will significantly improve the efficiency of the proposed method, particularly when there are big datasets for PD with many features of speech signals.

Conflicts of Interest:
The authors declare no conflict of interest.