Multimodal Learning and Intelligent Prediction of Symptom Development in Individual Parkinson’s Patients

We still do not know how the brain and its computations are affected by nerve cell deaths and their compensatory learning processes, as these develop in neurodegenerative diseases (ND). Compensatory learning processes are ND symptoms usually observed at a point when the disease has already affected large parts of the brain. We can register symptoms of ND such as motor and/or mental disorders (dementias) and even provide symptomatic relief, though the structural effects of these are in most cases not yet understood. It is very important to obtain early diagnosis, which can provide several years in which we can monitor and partly compensate for the disease’s symptoms, with the help of various therapies. In the case of Parkinson’s disease (PD), in addition to classical neurological tests, measurements of eye movements are diagnostic. We have performed measurements of latency, amplitude, and duration in reflexive saccades (RS) of PD patients. We have compared the results of our measurement-based diagnoses with standard neurological ones. The purpose of our work was to classify how condition attributes predict the neurologist’s diagnosis. For n = 10 patients, the patient age and parameters based on RS gave a global accuracy in predictions of neurological symptoms in individual patients of about 80%. Further, by adding three attributes partly related to patient ‘well-being’ scores, our prediction accuracies increased to 90%. Our predictive algorithms use rough set theory, which we have compared with other classifiers such as Naïve Bayes, Decision Trees/Tables, and Random Forests (implemented in KNIME/WEKA). We have demonstrated that RS are powerful biomarkers for assessment of symptom progression in PD.


Introduction
Experienced neurologists use their knowledge together with statistical intuition to study symptom development in Parkinson's disease (PD). By applying statistics to large databases, one can find significant information about the specificities of PD. However, due to the great variety of patients symptoms and treatments, some results obtained even from the most prominent experts can be inconsistent. Applying the most popular statistical averaging methods to such inconsistent conclusions may give confusing results, even leading to statements that specific care regimens do not effectively influence "average" PD patients. We may face similar problems when explaining factors resulting in

Materials and Methods
Our experiments were performed on 10 PD patients who had undergone Deep Brain Stimulation (DBS) surgery mainly related to their On-Off sensitivity to a medication (L-Dopa). They were qualified for the surgery and observed postoperatively in the Dept. of Neurology, Medical University of Warsaw and received surgical DBS in the Institute of Neurology and Psychiatry at University of Warsaw (WUM) [5]. We obtained horizontal reflexive saccade (RS) measurements in PD patients during four sessions designed as S1: MedOffDBSOff, S2: MedOffDBSOn, S3: MedOnDBSOff, S4: MedOnDBSOn. During the first session (S1), the patient was off medication (L-Dopa) and the DBS stimulator was Off; in the second session (S2), the patient was off medication, but the stimulator was ON; in the third session (S3), the patient had completed his/her doses of L-Dopa and the stimulator was Off, and in the fourth session (S4), the patient was on medication with the stimulator ON. Changes in motor performance, behavioral dysfunction, cognitive impairment, and functional disability were evaluated in each session according to the UPDRS. The reflexive saccades (RS) were recorded by a head-mounted saccadometer (Ober Consulting, Poland) that uses an infrared eye tracking system coupled with a head tracking system (JAZZ-pursuit-Ober Consulting, Poland). This has a sampling frequency of 1000 Hz, thus yielding high accuracy and precision in eye tracking.
Thanks to a head-mounted sensor, it is possible to compensate for possible head movements relative to the monitor.
A patient was sited at a distance of 60-70 cm from the monitor with head supported by a headrest in order to minimize head motion. We measured fast eye movements in response to a light spot switched on and off, moving horizontally from the straight eye fixation position (0 • ) to 10 • to the left or 10 • to the right after arbitrary times ranging between 0.5-1.5 s. When the patient fixated eyes on the spot in the middle marker (0 • ) the spot then changed color from white to green, indicating a signal for performance of RS reflexive saccades (RS); or from white to red meaning a signal for performing antisaccades (AS). Then the central spot was switched off and one of the two peripheral targets, selected at random with equal probability, was illuminated instead. There was no time lag between switching off of the central spot and switching on the peripheral target (gap = 0 task). Patients had to look at the targets and follow them as they moved in the RS task or made opposite direction saccades in the AS task. After making a saccade to the peripheral target, the target remained on for 0.1 s, after which another trial was initiated.
In each test, the subject had to perform 20 RS and 20 AS in a row in MedOff (medication off), within two situations: with DBSOff (S1) and DBSOn (S2). In the next step, the patient took medication and had a break for one-half to one hour, and then the same experiments were performed, with DBSOff (S3) and DBSOn (S4). In this work, we have analyzed only RS data using the following population parameters averaged for both eyes: delay mean (±SD) (standard deviation); amplitude mean (±SD); max velocity mean (±SD); duration mean (±SD).

Theoretical Basis
The structure of data is an important point of our analysis. Here, we represent them in the form of an information system or a decision table. We define such an information system [6] as a pair S = (U, A), where U, A are nonempty finite sets called the universe of objects and the set of attributes, respectively. If a ∈ A and u ∈ U, the value a (u) is a unique element of V (where V is denoted as the value set).
The indiscernibility relation of any subset B of A or IND(B), is defined [6] as follows: (x, y) ∈ IND(B) or xI (B) y if and only if a (x) = a (y) for every a ∈ B, where a (x) ∈ V. IND(B) is an equivalence relation, and [u] B is the equivalence class of u, or a B-elementary granule. The family of all equivalence classes of IND(B) will be denoted U/I(B) or U/B. The block of the partition U/B containing u will be denoted by B(u).
We define a lower approximation of symptoms set X ⊆ U in relation to a symptom attribute B as BX = {u ∈ U : [u] B ⊆ X}, and the upper approximation of X as BX = {u ∈ U : [u] B ∩ X = 0}. In other words, all symptoms are classified into two categories (sets). The lower movement approximation set X has the property that all symptoms with certain attributes are part of X, and the upper movement approximation set has property that only some symptoms with attributes in B are part of X (for more details see [7]). The difference between the uppper and lower approximations of X is defined as the boundary region of X: BN B (X). If BN B (X) is empty then X is exact (crisp) with respect to B; otherwise if BN B (X) = φ then X is not exact (i.e., is rough) with respect to B. We say that the B-lower approximation of a given set X is union of all B-granules that are included in X, and the B-upper approximation of X is of the union of all B-granules that have nonempty intersection with X.
The system S will be called a decision table S = (U, C, D) where C is the condition and D is the decision attribute [6,7]. In the decision table, the decision attribute D is placed in the last column, and condition attributes measured by the neurologist, are placed in other columns. On the basis of each row in the table, rules describing the condition of each patient can be proposed. As the number of rules is the same as the number of rows, these rules can have many particular conditions. The main concept of our approach is to describe different symptoms in different patients by using such rules.
However, as there are large inconsistencies between disease progression and symptoms between individual patients, and effects of similar treatments are not always the same. In order to describe such differences our rules must have some "flexibility", or granularity. In other words, one can also think about the probability of finding certain symptoms in a certain group of patients. This approach, called granular computation, should simulate the way in which experienced neurologists might interact with patients. It is the ability to perceive a patient's symptoms at various levels of granularity (i.e., abstraction) in order to abstract and consider only those symptoms that are universally significant and serve to determine a specific treatment(s). Looking into different levels of granularity determines the different levels of knowledge abstraction (generality), as well as assisting with greater understanding of the inherent knowledge structure. Granular computing simulates human intelligentce related to classification of objects in the sensory system (e.g., vision [2][3][4][5]) and motor [1] system as well as in task-specific problem solving behaviors.
We have also defined the notion of a reduct and template [1] as a basis for the decomposition. A decomposition tree is defined as a binary tree, whose every internal node is labeled by some template and external node (leaf) is associated with a set of objects matching all templates in a path from the root to a given leaf [8].
In our tests, we have divided our data into two or more subsets (mostly to four or six subgroups-four-or six-folding cross-validation tests). By training on all except one of these subgroups (the training set) using machine learning (ML), we have obtained classifiers that when applied to the remaining (test) set gave new numerical decision attributes. In the next step, we have correlated decision attributes obteined by ML with neurologist decision attributes in order to get a confusion matrix.

Computational Basis
We have extended our previous work [1] with new classification methods that were performed with the help of the KNIME Analytics Platform and WEKA. As a result, we have added new algorithms including:

•
Random Forest (two implementations: KNIME and WEKA) [9] • Decision Tables (two implementations: KNIME and WEKA) [10] • Bayes classifier (KNIME implementation) [11] • Tree ensembles (KNIME implementation) Tree ensemble models build classifiers based on ensembles of the decision trees that may be seen as random forest variants. As in the random forest model, training can be done on different sets of rows (records) and/or columns (describing attributes). The output model describes an ensemble of decision trees and is implemented by a simple majority vote.
We have applied certain sets of parameters for each of the above-mentioned algorithms. For the naïve Bayes classifier, we have use 0 as the default probability and 20 as the maximum number of nominal values per attribute. For the decision tree, we have used the gini index as a quality measure, no pruning with the reduced error option, a minimum number of records was 2, a maximum number of records per view was 1000, an average split point and the skipping of nominal columns without domain information. In the case of the WEKA implementation, we used the following evaluation metrics: accuracy (for discrete classes) and root mean square error (for numeric classes). For tree ensembles and the random forest (where applicable), we used the information gain ratio as split criteria, mid plot split, no tree limit, minimum split size and child node size, square root attribute sampling for columns, 100 of model number, different attribute set for each tree node. In the case of WEKA implementation, the number of trees was limited to 10.
Input data for those algorithms were prepared using native KNIME nodes. Processing included as it was in the original work discretization function. It is important to note that the discretization function in KNIME is much simpler that the one used in RSES. In comparison, the more sophisticated RSES discretizes functions, generating cuts based on decision attributes [12]; it bins numerical values into groups with equal frequencies.
In addition to the standard accuracy coefficient, we have also calculated from the confusion matrix the Matthews correlation coefficient that is a correlation coefficient between the observed and predicted binary classifications. We also have used SMOTE in order to oversample our data, and we have used nearest neighbor 3 with oversample by three parameters.

Dedicated Database
In the course of this study, a dedicated database for patients and measurement data has been designed and implemented. Given the number of patients under consideration and a moderate volume of the processed data, this project did not pose significant data management challenges. However, both the immediate needs of data analysis, and plans for the future maintenance of the dataset and the development of the related data collection and retrieval functionality, brought some important architectural and functional requirements.
To help protect the data integrity, both the alphanumerical data as well as the associated files (in the form of Binary Data Objects) are kept under the control of the database management system. The core structure of the database is designed to match the exact needs of the current research. It consists of the anonymized patient data, their associated history of examinations, and finally their measurement variants-with respect to the DBS (deep brain stimulation) and BMT (medication) on/off configuration. Given the number of the attributes and the evolution of their set in the early phase of the project, it became necessary to redesign the data structure to allow a greater degree of generality. Hence the structure is open for further extensions in terms of:

•
Defining new attributes describing the Patient, Examination, and Measurement entities; • Defining associated constraints or the dictionary of enumerative values for those attributes; • Defining new types of files for storing additional information regarding Patients, Examinations, and Measurements; • Establishing a numbering and naming convention for the instances of the above mentioned entities and files.
Although the current setup involves only the data collection and interface by a single team, the software is being prepared for functioning in a more distributed and versatile manner. The functionality under development includes: • Web service interfaces for contributing measurements made with mobile devices; • Advanced privilege management functionality to allow multiple-team collaboration and support for different access levels; • Internationalization in terms of user interface language and data and metadata language; • Integration with the human motion database, to extend the scope of future examination.
The database is currently operated by PJAIT and is to be maintained after the completion of the current project.
Differences between latencies: S1-S2, and S1-S4 were statistically significant (t-test p < 0.01) even when they were variable in individual patients ( Figure 1); meanwhile S1-S3 was not statistically significant; this is similar to differences between UPDRS/UPDRS III: S1-S2 and S1-S4 were statistically significant (p < 0.001), and S1-S3 was not statistically significant. Other parameters of RS did not change significantly with the session number.

Rough Set and Machine Learning Approach
As described above, we have used the RSES 2.2 (Rough System Exploration Program) [8] in order to find regularities in our data. At first, our data were placed in an information table (Table 1) as originally proposed by Pawlak [6]. The full table has 15 attributes and 36 objects (measurements). In Table 1 are values of 11 attributes for two patients: P#-patient number, age-patient's age, sex-patient's sex: 0-female, 1-male, t_dur-duration of the disease, S#-Session number, UPDRS-total UPDRS, HYsc-Hoehn and Yahr scale all measured by the neurologist and saccades measurements: SccDur-saccade duration; SccLat-sac-cade latency; SccAmp-saccade amplitude, and SccVel-saccade velocity.
In the next step, we have performed reduction of attributes (see reduct in the Methods section) to a minimum number of attributes describing our results. We have also created a discretization table: Other parameters of RS did not change significantly with the session number.

Rough Set and Machine Learning Approach
As described above, we have used the RSES 2.2 (Rough System Exploration Program) [8] in order to find regularities in our data. At first, our data were placed in an information table (Table 1) as originally proposed by Pawlak [6]. The full table has 15 attributes and 36 objects (measurements). In Table 1 are values of 11 attributes for two patients: P#-patient number, age-patient's age, sex-patient's sex: 0-female, 1-male, t_dur-duration of the disease, S#-Session number, UPDRS-total UPDRS, HYsc-Hoehn and Yahr scale all measured by the neurologist and saccades measurements: SccDur-saccade duration; SccLat-sac-cade latency; SccAmp-saccade amplitude, and SccVel-saccade velocity.
In the next step, we have performed reduction of attributes (see reduct in the Methods section) to a minimum number of attributes describing our results. We have also created a discretization table: here, single values of measurements were replaced by their range (as describe in the Method section on cut sets). As the result we have obtained the decision table (Table 2-see below).  In the first column is the patient number, in the second the patient age, dividing our group into patients below (Pat#28) or above (Pat#38) 55 years of age; disease duration and the Hoehn and Yahr scale were not considered important (stars), along with session number. Other parameters of saccades were also divided into ranges. It is interesting to note how the UPDRS were divided into different ranges: above 55, 22.5 to 55, 14 to 22.5, and below 14 (the last column). On the basis of this decision table, we can write the following rule: We read this formula above (Equation (1)), as stating that each row of the table (Table 2) can be written in form of such equation. It states that if we evaluate patient #38 and with age above 55 years and in session #4 and with saccade duration below 45.5 ms and saccade latency below 260 ms and . . . and saccade amplitude below 10.5, then the patient's UPDRS is below 14.

Estimation of UPDRS on the Basis of Eye Movement Measurements
In the next step, we need to find common rules for these equations by using a data mining system based on rough set theory [6]. We have tested our rule using machine-learning (ML) algorithms. At first, we have randomly divided our data into six groups (so-called a six-fold cross-validation method). We took five groups as a training set and tested the sixth one. By changing groups belonging to the training and test sets, we have removed the effect of accidental group divisions. The results of each test were averaged. The results are given as a confusion matrix (Table 3). In this case, as a ML algorithm that gave the best results we have used the decomposition tree (see the Methods section).

Estimation of UPDRS on the Basis of Eye Movement and Additional Measurements
Next, we extended our tests (in comparison with [1]) by adding to our dataset additional attributes, including: • PDQ39-quality of life measure • AIMS-Abnormal Involuntary Movement Score • Epworth-sleep scale A significant increase in accuracy was achieved in prediction of Total UPDRS by adding the PDQ attribute. We were able to achieve 90.3% of accuracy with 47% coverage. We can see confusion matrix for this classification in Table 4. We have performed several tests trying to predict UPDRS values on the basis of measured saccade properties. As changes in UPDRS and saccade latencies were similar when the session number was changed ( Figure 2) we tried to predict individual UPDRS values only from RS latencies. When the session number, patient age, RS latency, amplitude, and duration were added, the global accuracy in UPRDS prediction was about 80% (ML: decomposition tree, cross-validation-method). As UPDRS is a standard measurement in PD, the above results give the possibility of at least partly replacing or augmenting neurologist estimates with the eye movements (EM) measurement results.

Estimation of UPDRS on the Basis of Eye Movement and Additional Measurements
Next, we extended our tests (in comparison with [1]) by adding to our dataset additional attributes, including: Epworth-sleep scale A significant increase in accuracy was achieved in prediction of Total UPDRS by adding the PDQ attribute. We were able to achieve 90.3% of accuracy with 47% coverage. We can see confusion matrix for this classification in Table 4. We have performed several tests trying to predict UPDRS values on the basis of measured saccade properties. As changes in UPDRS and saccade latencies were similar when the session number was changed ( Figure 2) we tried to predict individual UPDRS values only from RS latencies. When the session number, patient age, RS latency, amplitude, and duration were added, the global accuracy in UPRDS prediction was about 80% (ML: decomposition tree, cross-validation-method). As UPDRS is a standard measurement in PD, the above results give the possibility of at least partly replacing or augmenting neurologist estimates with the eye movements (EM) measurement results.

Estimation of Different Therapies Effects
Another question that arises is whether EM can help to estimate possible effects of different treatments in individual patients. In order to demonstrate an answer, we have removed EM measurements and added other typically measured attributes such as: Schwab and England ADL Scale, and UPDRS III and UPDRS IV to the decision table and tried to predict the effects of different treatments as represented by sessions 1 to 4 (medication and stimulation effects). Table 5 is a part of full decision discretized-table with decision attributes-the session number placed in the last column. On the basis of this table, we have formulated rules using a rough set system and tested them with randomly divided data into six groups. We took five as a training set (using ML protocol) and tested

Estimation of Different Therapies Effects
Another question that arises is whether EM can help to estimate possible effects of different treatments in individual patients. In order to demonstrate an answer, we have removed EM measurements and added other typically measured attributes such as: Schwab and England ADL Scale, and UPDRS III and UPDRS IV to the decision table and tried to predict the effects of different treatments as represented by sessions 1 to 4 (medication and stimulation effects). Table 5 is a part of full decision discretized-table with decision attributes-the session number placed in the last column. On the basis of this table, we have formulated rules using a rough set system and tested them with randomly divided data into six groups. We took five as a training set (using ML protocol) and tested with sixth. In order to remove effect of the accidental division, we have exchanged training and test groups and averaged results placed in the confusion matrix (Table 6). Where * (star) is related to parameter(s) excluded (not significant) in the discretization process. Table 6. Confusion matrix for different session numbers (S1-S4). Column headers represent predicted values and row headers represent actual values. We have performed the same procedures once more (results of discretization are in Table 7) in order to test results of patients' eye movement influence on our predictions (presented as a confusion matrix in Table 8).  TPR: True positive rates for decision classes, ACC: Accuracy for decision classes, the global coverage was 0.5; the global accuracy was 0.91; coverage for decision classes: 0.6, 0.6, 0.6, 0.25.
As above, results of each test were averaged in a six-fold cross-validation resulting a confusion matrix ( Table 8). As a machine-learning algorithm, we have used the decomposition tree (see Methods). In the next step, we have extended our data set by three additional parameters: PDQ39, AIMS, Epworth and performed discretization as is shown below in Table 9. By extending our dataset with additional attributes we were able to achieve 4% more in accuracy for session prediction that make it 94.4% level. The confusion matrix for this test can be found in Table 10.

Estimation of UPDRS II and UPDRS III on the Basis of Eye Movement and Additional Measurements
Further on, we have extended our original work [1] by creating models for predicting other attributes; the best results we were able to achieve for UPDRS II and III, reaching 85% accuracy and 67.7% coverage for UPDRS II, and 81.7% accuracy and 61.1% coverage for III. Below we can see detailed discretized tables used in those models (Table 11 for UPDRS III) as well as confusion matrixes (Table 12 for UPDRS III, Table 13 for UPDRS II).

Comparison of Rough Set Based and Other Classifiers
Our ultimate purpose was to compare our models built using RS methods with other well-known algorithms. To accomplish this, we have used the same information tables and processed them with different classification algorithms as described in the methods section and results are shown below in Table 14. Table 14. As we can for those data sets we were able to achieve the most accurate results for the Rough Set-based approach but in most cases there were algorithms that could give predictions on a similar level, with the exception of classifiers for the session. Classifiers were tested on dataset with PDQ attribute. In summary, our new results have demonstrated that adding eye movement (EM) measurements to classical neurological data performed as the 'gold standard' by the most clinicians, improved predictions of disease progression, as measured by improvement in global accuracy from 0.53 to 0.91. The EM measurements may also partly replace some standard neurological measurements such as the UPDRS, as global accuracy of the total UPDRS predictions taken from EM data was 0.79 for our 10 PD patients. We have also performed comparisons to other well-known ML algorithms showing competitiveness of the RS algorithm (Table 14). All data from this study are shown in Table A1 (Appendix A). We have also tested influence of different training and testing sets on the accuracy of our classifications (Table B1, Table C1 in Appendixs B and C). We have used four-, five-, six-fold, and leave-one-out (40-fold for our dataset) tests. In the majority of tests for different classifiers (Table B1 in Appendix B), the six-fold gave similar accuracy to four-or five-fold and leave-one-out tests. However, for the rough set based classification in the majority of different parameters tests six-fold trials were the best.

Classifier
As our database is relatively small, we did not perform direct subsampling only oversampling by 3 with help of SMOTE. It was surprising that oversampling gave the best results for all algorithms. It is probably related to the fact that our dataset is small, as our inter-rater quantitative agreement measured by the Cohen's kappa was substantial (0.7) for all test with WEKA-Decision Table. We have also calculated the Matthews correlation coefficient (MCC) as another measure of prediction accuracy, which can be interpreted as a correlation coefficient between the observed and predicted classifications (Tables B1 and C1). We have got significant MCC with rough set based classification for UPDRS and session number (0.85 and 0.95- Table C1 in Appendix C), but for other algorithms only for the oversampled data (Table B1 in Appendix B).

Discussion
As mentioned above, there are significant differences between symptom developments and effects of different treatments in individual PD patients. As a consequence, even with large numbers of approaches and clinical trials, there still have been few conclusive results on direct therapeutic connection to quantitative changes in PD symptoms. There are numerous reasons for such disappointments: first, the limitations of current disease models in target validation and testing; second, difficulties in choosing clinical endpoints; third, problems in finding sensitive biomarkers of disease progression. The major drawback is that disease starts long before an individual patient notices the first disturbing motor symptoms. As mentioned above, a consequence of brain plasticity is compensation for loss of neurons, to an extent that subjects do not realize that problems are occurring in their brains over one or two decades at a time. Another aspect is that, because individual brains develop different specific compensatory mechanisms, individual pathologies form a large spectrum. Therefore, one needs new, more sensitive, possibly automatic and intelligent methods that may help in early diagnosis. In this study, we have given an example comparing classical neurological diagnostic protocols with a new approach. The main difference between these types of measures is in their precision and objectivity. Our approach is doctor-independent, and can be performed automatically and more often than standard doctor's visits. In the near future, it may help in transforming some hospital-based to home-based treatments. In this scenario, it will be possible to measure patient symptoms at home, and send these for consultation by neurologists. Such methods will be faster, more precise, and can help with more frequent measurements. As a consequence, they may help not only to determine more objectively a patient's symptoms, but also to follow up disease progression in more frequent intervals, something not possible currently, with the limited time resources of neurologists.
If we obtain such information, it may lead to more appropriate therapies and slowing down of disease progression. It is one of the purposes of this work to try to extract knowledge from symptoms in order to further develop more appropriate therapies and better control disease development.

Conclusions
We have presented a comparison of classical statistical averaging methods related to doctors' intuition and their PD diagnosis with different AI approaches. We have used processed neurological data from PD patients in four different treatments and have plotted averaged effects of medication and brain stimulation in individual patients. As these effects are strongly patient-dependent, they could not give enough information to predict new patients' behavior. The AI approaches (such as machine learning and data mining) are more universal, giving general rules for predicting individual patient responses to treatments as demonstrated above (in UPDRS, UPDRS II, UPDRS III, or session number predictions).