Exploring New Horizons: fNIRS and Machine Learning in Understanding PostCOVID-19
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript reports the application of several machine learning methods to post-COVID19 condition using large functional near-infrared spectroscopy (fNIRS) dataset. Though the study sounds interesting, I do not feel that the study has adequately taken advantage of this large dataset and contributes much new information to machine learning.
- The study collected samples from 37 participants (9 post-COVID-19, 28 controls). Then applied machine learning models to predict whether these participants have post-COVID conditions or not. The number of participants is very small, which may not be representative and suitable for machine learning models.
- The case and control do not match well. Given the difference between case and control, it is not difficult to predict the status of post-COVID condition, which reduces the need of fNIRS dataset and machine learning models.
- The study collected a large number of fNIRS, which may provide very useful information to study patients with post-COVID conditions. However, the use of such large number of samples to predict the status of a small number of participants seems to abuse the dataset. Even though the performance machine learning models are excellent, it is unlikely that other labs can use the method if the large samples are needed.
- It will be meaningful to use a small number of fNIRS samples to make predictions. If these small samples are good to predict post-COVID conditions, these small samples can be used as signature or key features for the health outcome, providing more important information for diagnosis. As a result, it is important for this study to focus on finding key features from the large samples.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis manuscript investigates the integration of portable functional near-infrared spectroscopy (fNIRS) with machine learning to detect neural correlates of post-COVID-19. A dataset of 37 participants (9 post-COVID, 28 controls) was segmented into 29,737 labeled time-series samples, with four feature representation strategies (raw, PCA, statistical, and hybrid) compared across six classifiers. The best performance was reported for an SVM trained on hybrid features (AUC = 0.91).
The study is promising and potentially impactful, but substantial methodological clarification and reanalysis are required, particularly regarding preprocessing, feature extraction pipelines, and validation strategy.
- Risk of Information Leakage with PCA and Scaling.
The manuscript reports that PCA was applied globally before cross-validation, which introduces information leakage and optimistic bias. All transformations that learn parameters from the data (scaling, PCA, feature selection) must be fit only on the training folds and then applied to the test folds.
Re-run analyses using proper pipelines (e.g., StandardScaler - PCA - classifier) within each cross-validation fold. This is a well-established best practice. Global PCA, if retained, should only appear as a supplementary, exploratory analysis.
- Fragmentation of the Time Series.
Nearly 30,000 “samples” are derived from just 37 subjects. Metrics reported at the fragment level arevulnerable to autocorrelation.
*Provide a clear description of windowing (duration, stride, overlap).
*Report subject-level results (e.g., majority vote or averaged probabilities per subject) with confidence intervals.
*Clarify whether tapping vs rest periods were analyzed separately or pooled under a single subject-level label.
- Imbalanced classes.
Although the manuscript reports standard metrics such as accuracy, sensitivity, specificity, PPV, NPV, and AUC, these are more informative under balanced class conditions. Given the imbalance of 9 post-COVID vs 28 controls (≈1:3 samples), it would be highly recommended to add imbalance-robust metrics such as PR-AUC, class-wise F1-scores, and MCC, which are not currently reported.
- Inconsistencies Between Results and Narrative:
- The abstract and results highlight SVM (AUC 0.91) as best, but Figure 6 suggests KNN as “best” for hybrid features, contradicting the text.
Define a primary metric and ensure consistency across abstract, tables, and figures.
- Demographics and Clinical Table (Table 2)
- The p-value for MoCA is inconsistent (0.0044 in text vs 0.0297 in Table 2).
- “Days post-infection” is unclear for controls (who were SARS-CoV-2 negative).
Several preprocessing choices are mentioned but not fully explained. For example, the band-pass filter range, the use of a partial pathlength factor of 1, the decision to analyze only HbO, and the handling of short-separation channels are not justified in detail.
Please provide a clear justification for these preprocessing parameters and choices, and, if possible, indicate whether alternative settings were tested to confirm the robustness of the results. - Hyperparameter Tuning
It is unclear whether hyperparameters were tuned (e.g., SVM, KNN, XGBoost, etc). Without tuning, results may be suboptimal.
Implement grid/random search within training folds, and report search spaces and selected values.
- Metrics and class imbalance
Reported sensitivity is modest (40–60%) despite high AUC, raising questions about clinical applicability. Class imbalance (6,084 post-COVID vs 23,655 controls) is a problem.
Although the manuscript reports standard metrics such as accuracy, sensitivity, specificity, PPV, NPV, and AUC, these are more informative under balanced class conditions. Given the imbalance of 9 post-COVID vs 28 controls (≈1:3 samples), it would be highly recommended to add imbalance-robust metrics such as PR-AUC (Precision-Recall AUC), F1-scores, and MCC (Matthews Correlation Coefficient), which are not currently reported.
Include imbalance-robust metrics. Report sensitivity/specificity trade-offs at clinically relevant thresholds.
- Novelty Claim
The statement that this is “the first application of fNIRS targeting the motor cortex in post-COVID” should be softened unless supported by a systematic literature scan.
Rephrase to “to our knowledge…” or cite a structured search. - Reproducibility
The manuscript mentions “transparent methods and shared resources” (Table 1) but provides no link to a repository.
Provide code/notebooks (pipelines, requirements, seeds) and derived/anonymous data in a public repository (e.g., GitHub, Zenodo).
Additional comments:
- Standardize abbreviations (e.g., CV5, 5-fold cross-validation, AUC-ROC).
- Report sample sizes per test in Table 2.
- Ensure consistent decimals across tables and figures.
- Refine abstract wording: “balanced sensitivity and specificity” is misleading when sensitivity is as low as 60% for SVM and 40–50% for most methods.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors addressed several of my comments; however, they did not respond to my primary concern. The manuscript applied a few traditional machine learning models and compares their performance but did not contribute new methodological knowledge. I previously recommended a deeper analysis of the dataset to identify informative features or signatures for the health outcome. However, the authors did not do it. Without such analysis the work provides limited insight beyond a basic benchmarking exercise.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors’ responses are satisfactory; I have no further comments.
Author Response
Please see the attachment.
Author Response File:
Author Response.docx
Round 3
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors addressed my concerns. No additional comments.

