Next Article in Journal
Design of Safety Evaluation and Risk Traceability System for Agricultural Product Quality
Previous Article in Journal
Lotus Root Type Nickel Oxide-Carbon Nanofibers: A Hybrid Supercapacitor Electrode Material
 
 
Article
Peer-Review Record

Classification of Parkinson’s Disease Using Machine Learning with MoCA Response Dynamics

Appl. Sci. 2024, 14(7), 2979; https://doi.org/10.3390/app14072979
by Artur Chudzik 1 and Andrzej W. Przybyszewski 1,2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2024, 14(7), 2979; https://doi.org/10.3390/app14072979
Submission received: 24 February 2024 / Revised: 25 March 2024 / Accepted: 28 March 2024 / Published: 1 April 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this paper, the authors built an online platform for screening and diagnostics of Parkinson's disease based on the MoCA test. The authors collected the test results of 31 subjects using the platform, and some machine learning methods were applied to examine the predictive accuracy of subjects' health status based on the collected MoCA scores. Generally, data of 31 subjects is too small and the reliabilities of any models trained using the data and relevent results are also limited. 

I have some major concerns below:

1. My biggest concerns are how the machine learning models were trained and how the authors utilized them to predict the subjects' health status. 

Did the authors train the machine learning models using the 31 subjects and then predict their outcomes? If this is the case, then certainly there exists a significant risk of overfitting, and the information from the testing dataset may have leaked into the training processes, which makes the results of experiments I-III unreliable. In fact, the high AUCs in experiment III are rare to see in practice and suggest a potential leak of knowledge of the testing dataset. 

2.  In Section 3.7, the authors mentioned using regularization methods such as L2 norm to prevent overfitting. What does it mean? Does it mean that the authors performed some feature selections for logistic regression and other machine-learning models in the paper? If yes, which set of features were considered?

3. Why did the authors consider using gender rather than age as a predictor, especially given the significant association between age and health status?

4. How was the health status defined?

Other comments: 

I don't know why, in the introduction, the authors spent several paragraphs talking about the importance of engineering perspectives in biomedical studies, which has a rather weak connection to the research topic of the paper, rather than using some paragraphs to discuss their online screening and diagnostics platform instead of just one sentence. 

 

Author Response

Reviewer #1

Dear Reviewer. We appreciate your detailed feedback on our manuscript, "Classification of Parkinson’s Disease using Machine Learning with MoCA Response Dynamics." Your insights have been valuable in refining this paper.

Below, we address each of your points individually, explaining how we have considered them in our revision process. All changes are marked in yellow in the manuscript, for better readability.

Point #1: My biggest concerns are how the machine learning models were trained and how the authors utilized them to predict the subjects' health status. Did the authors train the machine learning models using the 31 subjects and then predict their outcomes? If this is the case, then certainly there exists a significant risk of overfitting, and the information from the testing dataset may have leaked into the training processes, which makes the results of experiments I-III unreliable. In fact, the high AUCs in experiment III are rare to see in practice and suggest a potential leak of knowledge of the testing dataset.

Response: Thank you for this comment and we greatly appreciate your interest in our research methodology. Handling such a limited dataset indeed requires careful assessment. As we mention in the text, we employed a train_test_split approach, allocating 30% of the data for testing before modelling. This split was done randomly to ensure that 21 subjects were used for training and 10 for testing, preventing any systematic bias in the allocation of data.

To further mitigate the risk of overfitting, we also implemented cross-validation techniques during the model training phase. This approach allowed us to assess the model's performance more accurately and ensure its generalizability to new, unseen data.

Additionally, we applied regularization methods where possible in machine learning models to penalize complexity and discourage the learning of noise from the training set. This step helps to prevent overfitting, making our model more robust to unseen data.

We believe these measures collectively helped to minimize the risk of overfitting and data leakage, thus ensuring the reliability of our experiments I-III results. We hope that this additional context addresses your concerns and presents the robustness of the study.

Point #2: In Section 3.7, the authors mentioned using regularization methods such as L2 norm to prevent overfitting. What does it mean? Does it mean that the authors performed some feature selections for logistic regression and other machine-learning models in the paper? If yes, which set of features were considered?

Response: Thank you for your question regarding our use of regularization methods and feature selection in this study. To clarify, regularization is a technique designed to prevent overfitting by adding a penalty to the model's complexity, thus encouraging the learning of simpler models. Specifically, L2 regularization, which we employed in this study, penalizes the sum of the square of the model weights. This approach discourages large weights, leading to a model that is less likely to fit the noise in the training data and more likely to generalize well to new, unseen data.

In selected machine learning models, including logistic regression among others, we applied L2 regularization to enhance the models' generalization capabilities. The decision to use L2 regularization was based on its effectiveness in handling multicollinearity, preventing model overfitting, and its ability to produce models that perform well on unseen data.

Regarding feature selection, the application of L2 regularization indirectly influences feature selection by shrinking the less important feature's coefficients closer to zero, which helps in identifying more significant predictors. However, it's important to note that L2 regularization does not perform explicit feature elimination but rather adjusts the scale of contribution of each feature. Thank you for the opportunity to clarify that.

Point #3: Why did the authors consider using gender rather than age as a predictor, especially given the significant association between age and health status?

Response: Thank you for raising this important question about the choice of predictors in the study. The decision to utilize gender as a predictor, rather than age, was informed by several considerations.

As you rightly noted, there was a significant age gap among participants, with a strong correlation between age and health status. Preliminary analyses indicated that using age as a predictor would lead to a model that essentially segregates the data into two clusters based solely on age. This would result in perfect fit issues, where the model overfits the training data by capturing the age-health status relationship too closely, and also limit the model's applicability and generality to broader populations.

By focusing on gender alongside other predictors, we wanted to develop models that could potentially identify health status indicators across more diverse and inclusive demographic profiles. This approach allows for the exploration of health determinants in a manner that is not only reliant on age, which is particularly valuable in settings where age may not be the primary factor of interest or where age information is not available.

In conclusion, we acknowledge the significance of age as a determinant of health status; however, the specific context and goals of our study led us to prioritize the investigation of other variables. We believe this approach contributes valuable insights into the complex interplay of factors affecting health and encourages further research into diverse predictors of health outcomes. Thank you for the opportunity to clarify that point.

Point #4: How was the health status defined?

Response: Thank you for your question on how we defined health status within our study. Health status was determined based on a combination of clinical diagnoses and severity of PD symptoms. Participants were divided into two primary groups:

  1. PD Group: This group was composed of individuals with a confirmed diagnosis of Parkinson’s disease, verified by receiving treatment and advice from neurologists at UMass Chan Medical School. Within this group, health status was further categorized based on the severity of PD symptoms, measured by the Unified Parkinson's Disease Rating Scale (UPDRS) Part III: MILD: Participants with UPDRS III scores between 10 and 29, and ADVANCED: Participants with UPDRS III scores above 30.
  2. Reference Group: To create a contrast with the PD group, we included a group of students from the Polish Japanese Academy of Information Technology, chosen based on the lower likelihood of young people having PD. This group was presumed healthy for the purposes of our study, given their demographic profile and absence of PD symptoms, despite not undergoing neurological examinations to directly confirm the absence of PD.

Our approach is targeted to establish a distinction between individuals with varying stages of PD and a generally healthy population. This method allowed us to explore and identify patterns related to PD's impact on health status. Therefore we believe this method of defining health status is appropriate for our study's objectives, providing a foundation for analyses and conclusions regarding PD and its indicators. Thank you for the opportunity to clarify that point.

Point #5: I don't know why, in the introduction, the authors spent several paragraphs talking about the importance of engineering perspectives in biomedical studies, which has a rather weak connection to the research topic of the paper, rather than using some paragraphs to discuss their online screening and diagnostics platform instead of just one sentence.

Response:  Thank you for your insightful feedback regarding the content of the introduction. We recognize the importance of providing a balanced overview that not only sets the stage for an interdisciplinary approach but also highlights the significance of the online  platform.

Upon reflection, we agree that the introduction could benefit from a more detailed discussion on the online platform, given its centrality to our study's objectives and methodology. The intention behind presenting engineering principles was to underline the necessity of such perspectives in modernizing biomedical solutions, especially for complex conditions like neurodegenerative diseases. However, we acknowledge that this focus may have dominated the detailed explanation of our platform, which is a central component of our research.

Therefore to address this, we propose the following amendments to our introduction. First, we reduced the length of the discussion on engineering perspectives, and we focused directly to the challenges, development, and application of biomedical technologies. This allows us to maintain the context of interdisciplinary importance without weakening the focus on research subjects.

Furthermore, we expanded the description of our online screening and diagnostics platform by detailing the transformation of the Montreal Cognitive Assessment (MoCA) into its online version. This expansion includes:

  1. a rationale behind selecting MoCA for digitization, showing its established reliability and effectiveness in cognitive evaluation,
  2. a description of the technological and methodological adaptations made to ensure the online test's fidelity to its pen-and-paper counterpart,
  3. an overview of the challenges encountered during this transition, particularly regarding usability for elderly users, and how these challenges were addressed to minimize their impact on test results.

Moreover, we highlighted the relevance of machine learning algorithms in enhancing the reliability and accuracy of online diagnostics, providing a clearer connection between the engineering principles discussed earlier.

Thank you again for your feedback. We appreciate the opportunity to improve the clarity and focus of our paper and believe these amendments will enhance the introduction.

We greatly appreciate the time and effort that the Reviewer put into reviewing our work. We believe that clarification strengthens the manuscript by providing a more nuanced understanding of the complexities in applying ML and digital biomarkers for the detection of NDs.

Reviewer 2 Report

Comments and Suggestions for Authors

In this manuscript, the authors classify Parkinson’s Disease (PD) using rough set theory (RST) and classic machine learning (ML) techniques, including logistic regression, support vector machines, and random forests. They compare the diagnostic efficiency of these ML models by analyzing Montreal Cognitive Assessment (MoCA) test results to classify scores of individuals with PD and healthy subjects. Overall, the manuscript is well-written, but there are several issues that need to be addressed in the methodology section:

Correction in Section 3.7: The manuscript mentions that matplotlib, seaborn, and pandas in the context of sklearn. However, these libraries are not part of the scikit-learn (sklearn) packages. The authors should correct this for readability.

High Correlation Among Features (Figure 1): The authors observe high correlation among the features used. It is essential to address the issue of multicollinearity in the data before analysis. Multicollinearity can impact the performance of models, such as Logistic Regression and Support Vector Machine (SVM), leading to biased results. The authors should provide details on how they mitigate this issue.

ML Algorithm Architecture and Working: The study utilizes an ML algorithm, but the manuscript lacks details about its architecture and working. Providing information on the chosen algorithm, would enhance the reader’s understanding.

Data Split and k-Fold Cross-Validation (CV): The authors mention using k-fold CV for hyperparameter tuning. It is crucial to specify whether the test data was separated before applying k-fold CV. If the test data was not separated, there is a risk of overfitting, compromising model generalization.

Handling Categorical Features: The manuscript does not elaborate on how categorical features were handled during preprocessing and modeling. Clear documentation of feature encoding or transformation methods is necessary.

Author Response

Reviewer #2

Dear Reviewer. We appreciate your detailed feedback on our manuscript, " Classification of Parkinson’s Disease using Machine Learning with MoCA Response Dynamics." Your insights have been valuable in refining this paper.

Below, we address each of your points individually, explaining how we have considered them in our revision process. All changes are marked in yellow in the manuscript, for better readability.

Point #1: Correction in Section 3.7: The manuscript mentions that matplotlib, seaborn, and pandas in the context of sklearn. However, these libraries are not part of the scikit-learn (sklearn) packages. The authors should correct this for readability.

Response: Thank you for pointing out the inaccuracy in our manuscript regarding the description of Python libraries used in conjunction with scikit-learn (sklearn). We understand the importance of clearly and accurately describing the software tools and libraries employed in our research, as it directly impacts the readability and reproducibility of our work.

The sentence in Section “Machine Learning Approach” indeed inaccurately suggested that matplotlib, seaborn, and pandas are part of the scikit-learn package. To clarify, these libraries are used in conjunction with scikit-learn for data manipulation, visualization, and analysis but are not sub-packages or components of scikit-learn itself. The corrected sentence should read:

"To challenge the Rough Set Exploration System results, we implemented three classic machine learning models in Python using the scikit-learn (sklearn) library, which facilitates a range of machine learning tools. Additionally, we utilized complementary Python libraries for data manipulation and visualization, including pandas, seaborn, and matplotlib."

This correction ensures that the description accurately reflects the distinct roles and contributions of these libraries to our research methodology. Moreover, we included missing references for these libraries. We appreciate your attention to detail and the opportunity to update this paragraph and to improve the accuracy of this manuscript.

Point #2: High Correlation Among Features (Figure 1): The authors observe high correlation among the features used. It is essential to address the issue of multicollinearity in the data before analysis. Multicollinearity can impact the performance of models, such as Logistic Regression and Support Vector Machine (SVM), leading to biased results. The authors should provide details on how they mitigate this issue.

Response: Thank you for raising the critical issue of multicollinearity observed among the features in our dataset. Multicollinearity, where independent variables are highly correlated, can indeed skew the results of certain machine learning models, affecting the accuracy of predictions and the interpretability of the model coefficients. To address and mitigate the effects of multicollinearity in our study, we took several strategic steps as outlined below.

  1. Experiment I/III focuses solely on the MoCA score for predicting health status, effectively isolating this feature to examine its predictive power in the absence of correlated variables.
  2. Experiment II/III introduces gender alongside the MoCA score, incorporating a categorical variable less likely to introduce significant multicollinearity but offering additional insights into the model's performance.
  3. Experiment III/III combines the MoCA score with IRT (moca_reaction_ms) and TTS (moca_response_ms), both of which are related but provide distinct aspects of the cognitive assessment. Despite their correlation, their joint inclusion is to explore the combined predictive strength of reaction and response times on health status.

Furthermore, for models susceptible to multicollinearity, such as Logistic Regression and SVM, we applied regularization techniques (as L2 regularization) that penalize large coefficients, thus reducing the risk of overfitting and mitigating the adverse effects of multicollinearity. We appreciate your attention to detail and the opportunity to improve the accuracy of this manuscript.

Point #3: ML Algorithm Architecture and Working: The study utilizes an ML algorithm, but the manuscript lacks details about its architecture and working. Providing information on the chosen algorithm, would enhance the reader’s understanding.

Response: Thank you for the opportunity to clarify the machine learning (ML) algorithm architecture and workings utilized in our study. Our methodology centers around Rough Set Theory (RST) and is complemented by the comparison with three machine learning models: Logistic Regression, Support Vector Machine (SVM), and Random Forest. Now, we provide a detailed overview and relevant equations to enhance understanding of the chosen algorithm (RST).

Thank you for this possibility to clarify how ML algorithms work. We hope that this detailed explanation addresses the initial concern regarding the lack of specific details on the ML algorithm architecture.

Point #4: Data Split and k-Fold Cross-Validation (CV): The authors mention using k-fold CV for hyperparameter tuning. It is crucial to specify whether the test data was separated before applying k-fold CV. If the test data was not separated, there is a risk of overfitting, compromising model generalization.

Response: Thank you for highlighting the importance of careful data management practices in the study, specifically concerning the use of k-fold cross-validation (CV) and the separation of test data. We recognize the critical nature of these procedures in maintaining the validity and reliability of machine learning models, and we appreciate the opportunity to clarify methodology.

To directly address your concern: Yes, the test data was indeed separated before applying k-fold CV for hyperparameter tuning. Here's a detailed overview of our approach:

  1. Initially, we partitioned our dataset into a training set and a test set. This separation was conducted at the outset, ensuring that the test set remained untouched and completely independent throughout the model training and hyperparameter tuning phases. Specifically, we allocated 70% of the data for training and 30% for testing, a common practice that balances the need for training data with the necessity of a “robust” test set.
  2. Only after separating out the test data did we apply k-fold CV to the training dataset for hyperparameter tuning. This approach allowed us to iteratively train and validate our models on different subsets of the training data, optimizing hyperparameters to enhance model performance without any exposure to the test data. By employing k-fold CV only within the training set, we mitigated the risk of overfitting and ensured that the hyperparameter tuning process did not incorporate information from the test set.
  3. Finally, after completing the hyperparameter tuning through k-fold CV on the training set, we evaluated the performance of the optimized models on the independent test set. This step confirmed the generalizability of our models, as the test set had not been used in any part of the model training or tuning processes.

We hope this detailed explanation addresses any concerns regarding potential. Ensuring the integrity of the test data and the generalizability of our models is very important for the field, and we thank you for the opportunity to clarify the approach.

Point #5: Handling Categorical Features: The manuscript does not elaborate on how categorical features were handled during preprocessing and modeling. Clear documentation of feature encoding or transformation methods is necessary.

Response: Thank you for your question regarding the handling of categorical features in the dataset. We understand the importance of clear documentation of feature encoding or transformation methods, as these steps are crucial for the preparation of the dataset for modeling. Here is a detailed explanation of how categorical features were managed during our preprocessing and modeling stages:

  1. The 'gender' variable was encoded as a binary categorical feature, with 0 representing male (M) and 1 representing female (F). This encoding was straightforward, given the binary nature of the aggregated data, and was directly utilized in our machine learning models without further transformation.
  2. The ‘is_healthy’ variable was also a binary categorical feature indicating the health status of the subjects (0 for patients with Parkinson’s disease and 1 for healthy subjects). This binary encoding was chosen to enable clear distinction and modeling of health status as a response variable in predictive analyses.
  3. Although the 'UPDRS Group' variable was included in our dataset for a general overview, it was not used during the training of our models. The decision to exclude this variable from training was made because the UPDRS Group is related to the diagnosis and severity of Parkinson's disease, which could introduce bias into the model when predicting the health status based on broader, non-diagnostic features. Its primary role was to provide context and depth to the clinical profile of the PD patients for the readers and was not intended as a feature for prediction.

Thank you for the opportunity to clarify this point. We greatly appreciate the time and effort that the Reviewer put into reviewing our work. We believe that clarification strengthens the manuscript by providing a more nuanced understanding of the complexities in applying ML and digital biomarkers for the detection of NDs.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed all my concerns. 

Back to TopTop