Supporting ASD Diagnosis with EEG, ML and Swarm Intelligence: Early Detection of Autism Spectrum Disorder Based on Electroencephalography Analysis by Machine Learning and Swarm Intelligence
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors- This study used a machine learning method for diagnosing ASD using EEG signals. The Sheffield dataset includes data from 28 individuals with autism spectrum conditions (ASC) and 28 neurotypical controls, aged between 18 and 68 years. Please clarify this in the article.
- Machine learning models typically require hundreds to thousands of samples to generalize well. Please justify using only 56 samples.
- The dataset have balanced groups (e.g., autistic vs. neurotypical). However, 28 subjects per class seem to be too small for ML.
- Please calculate your statistical Power for 56 sample size.
- Did you ever look at a large-scale study titled "Lack of univariate, clinically-relevant biomarkers of autism in resting state EEG: a study of 776 participants". This study combined data from multiple datasets,
including the dataset you used, to analyze resting-state EEG data from 776 participants. - The labels in the dataset are binary (ASD vs. control), but ASD is a spectrum. Did you consider this in the ML algorithms?
Author Response
Comments 1: This study used a machine learning method for diagnosing ASD using EEG signals. The Sheffield dataset includes data from 28 individuals with autism spectrum conditions (ASC) and 28 neurotypical controls, aged between 18 and 68 years. Please clarify this in the article. |
Response 1: Thank you for your valuable comment. We have clarified the composition of the dataset in the Materials and Methods – Database section of the revised manuscript. Specifically, we now explicitly state that the Sheffield dataset (Dataset 1) comprises 28 individuals diagnosed with ASC and 28 neurotypical controls, totaling 56 participants. This information has also been added briefly in the Introduction to improve transparency and clarity regarding the sample size and group distribution. |
Comments 2: Machine learning models typically require hundreds to thousands of samples to generalize well. Please justify using only 56 samples. |
Response 2: We appreciate your observation. While it is true that larger datasets are generally preferred for robust ML model training and generalization, our work is based on a publicly available dataset (Sheffield dataset), which currently contains only 56 participants. We recognize this as a limitation and have expanded the Limitations section to provide a more comprehensive justification: - The Sheffield dataset is one of the few publicly accessible EEG datasets for ASD with clinical labels. - Several recent studies in ASD diagnosis using EEG-based ML have employed similarly small sample sizes due to limited availability of high-quality labeled data [e.g., Kang et al., 2020; Jayawardana et al., 2019; Abdolzadegan et al., 2020]. - To mitigate overfitting and improve reliability, we applied rigorous cross-validation techniques (10-fold × 30 repetitions). - Our goal is exploratory: to test the feasibility of our approach before applying it to larger, multi-site datasets in future work.
|
Comment 3: The dataset has balanced groups (e.g., autistic vs. neurotypical). However, 28 subjects per class seem to be too small for ML. Please calculate your statistical Power for 56 sample size.
Response 3: Thank you for raising this important point. We have performed a statistical power analysis assuming a medium effect size (Cohen’s d = 0.5), alpha = 0.05 (two-tailed).
- For a two-sample t-test comparing two independent means, with n = 28 per group:
- Achieved power ≈ 78%
- This level of power is considered acceptable for pilot or exploratory studies.
We have included these results in the Materials and Methods section of the revised manuscript, noting that while not ideal, the sample size provides reasonable power for an initial investigation.
Comment 4: Did you ever look at a large-scale study titled "Lack of univariate, clinically-relevant biomarkers of autism in resting state EEG: a study of 776 participants"? This study combined data from multiple datasets, including the dataset you used, to analyze resting-state EEG data from 776 participants.
Response 4: Yes, we are aware of the study by Deed et al. (2023) titled "Lack of univariate, clinically-relevant biomarkers of autism in resting state EEG: a study of 776 participants." We thank you for pointing this out.
In that study, the authors concluded that univariate features (such as average band power) are insufficient for distinguishing ASD from controls. In contrast, our approach goes beyond univariate analysis by extracting multivariate features (e.g., Hjorth parameters, waveform length, entropy) and combining them with machine learning models and feature selection techniques (PSO and evolutionary search). These methods allow us to capture complex patterns across multiple EEG channels and time windows, which may be more sensitive than single-variable approaches.
We have added a discussion of this study in the Discussion section and emphasized how our multivariate ML-based pipeline offers a different perspective compared to traditional univariate biomarker analyses.
Comment 5: The labels in the dataset are binary (ASD vs. control), but ASD is a spectrum. Did you consider this in the ML algorithms?
Response 5: You are absolutely correct — ASD is a spectrum disorder with significant heterogeneity among individuals. However, the dataset we used (Sheffield dataset) only provides binary labels: ASD or neurotypical control. But since the acquisition methodology employs EEG, it is probable that the autistic individuals are from support level 1.
We acknowledge this limitation and have added the following points to the Limitations and Future Work sections:
- Our current analysis uses the available binary classification due to dataset constraints.
- In future work, we plan to explore multi-dimensional labels (e.g., symptom severity scores such as ADOS, CARS, AQ) when such data become available.
- We will also investigate regression-based models or clustering techniques to better capture the dimensional nature of ASD.
- Additionally, we aim to collect and build a proprietary EEG database with richer clinical metadata to support more nuanced modeling.
Reviewer 2 Report
Comments and Suggestions for AuthorsSupporting ASD Diagnosis with EEG, ML and Swarm Intelligence Early
detection of autism spectrum disorder based on electroencephalography
analysis by machine learning and swarm intelligence
Fonseca et al.
The authors deal with the problem of diagnosing autism based on
electroencephalogram (EEG) signals. Evolutionary search methods and
machine learning models are applied and the authors claim high
performance classification results. The authors motivate the work
by emphasizing the importance of early detection, which can be
difficult.
The authors mention both EEG signals and machine learning have already
been used by earlier researchers. The mention several approaches. It
would be better if they could also make it clear exactly how this work
stands among them by comparing and contrasting with with earlier work,
rather than just mentioning earlier work.
The dataset they use has 56 participants. Is this sufficient to draw
the conclusions given. What could be the limitations?
Table captions should be longer, providing a complete description of
the data included in that table.
The discussion section mostly describes the data, but less analyzes
and synthesizes to derive broader conclusions. It would be good to
have a stronger effort to really extract as much as possible
interpretation from the data.
The conclusion should include some numerical percentages of
performance.
With reasonable revision along the above lines, the manuscript
may be worth publishing.
Author Response
Comment 1: The authors mention both EEG signals and machine learning have already been used by earlier researchers. They mention several approaches. It would be better if they could also make it clear exactly how this work stands among them by comparing and contrasting with earlier work, rather than just mentioning earlier work.
Response 1:
We thank the reviewer for this important suggestion. In the revised manuscript, we have enhanced the "Related Works" section (Section 1.2) to include a comparative analysis between our approach and previously published studies. Specifically:
- We now compare our method (EEG-based ML with PSO and evolutionary feature selection) with other works that use:
- Multimodal data fusion (e.g., Kang et al., 2020),
- Deep learning models (e.g., Baygin et al., 2021; Radhakrishnan et al., 2021),
- Feature engineering + classical ML (e.g., Abdolzadegan et al., 2020).
- We highlight how our study contributes by combining swarm intelligence and evolutionary search with traditional ML models to optimize performance.
- We emphasize that our methodology focuses on feature reduction and interpretability, which is often overlooked in deep learning-based approaches.
This comparison allows readers to clearly understand where our work fits within the current state-of-the-art and what novel contributions it brings.
Comment 2: The dataset they use has 56 participants. Is this sufficient to draw the conclusions given? What could be the limitations?
Response 2:
We acknowledge the concern regarding the relatively small sample size. While 56 participants may seem limited, especially in machine learning applications, we justify its use based on the following:
- The Sheffield dataset is one of the few publicly available EEG datasets with clinical labels for ASD.
- Many recent studies in the field also work with similar or smaller sizes due to limited access to high-quality labeled EEG data [e.g., Jayawardana et al., 2019; Abdolzadegan et al., 2020].
- To mitigate overfitting and improve reliability, we applied rigorous cross-validation techniques (10-fold × 30 repetitions).
- We performed a statistical power analysis (see Limitations), showing that with n = 56, we achieved ~78% power to detect moderate effect sizes.
Nonetheless, we fully recognize the limitations of small samples and have added an expanded Limitations section, explicitly stating that:
- Our findings should be considered exploratory.
- Larger, multi-site datasets are needed for generalization.
- Future work will involve collecting a larger, more diverse dataset with richer clinical metadata.
Comment 3: Table captions should be longer, providing a complete description of the data included in that table.
Response 3:
Thank you for pointing this out. We have revised all table captions to be more descriptive and informative. For example:
Example: Table 2 – Before:
"Results of the training and validation stage for the dataset from original signals."
Example: Table 2 – After:
"Performance metrics (Accuracy, Kappa Statistic, Sensitivity, Specificity, AUC) of various classifiers trained on the full EEG dataset without feature selection. Results represent average values ± standard deviation across 30 independent runs using 10-fold cross-validation."
Each table caption has been similarly updated to describe:
- What data is shown,
- How it was obtained,
- What units or statistical measures are used.
Comment 4: The discussion section mostly describes the data, but less analyzes and synthesizes to derive broader conclusions. It would be good to have a stronger effort to really extract as much as possible interpretation from the data.
Response 4:
We appreciate the reviewer’s observation. The Discussion section (Section 4) has been significantly expanded and restructured to go beyond mere description of results. Key improvements include:
- Synthesis of key findings across different classification strategies (original, PSO, evolutionary search).
- Interpretation of model performance differences, e.g., why SVM with RBF kernel outperformed others in certain conditions.
- Comparison of feature selection methods: Why PSO improved efficiency while maintaining accuracy, and how evolutionary search led to simpler but still effective models.
- Implications for future research, such as:
- Feasibility of deploying these models in real-world diagnostic tools.
- Trade-offs between model complexity and performance.
- Need for multimodal integration (EEG + behavioral data).
These additions allow for deeper insight into the significance of the results and their implications for the field.
Comment 5: The conclusion should include some numerical percentages of performance.
Response 5:
Thank you for this suggestion. The Conclusion section (Section 5) has been rewritten to include specific performance metrics, such as:
"Our best-performing model, the SVM with RBF kernel after PSO-based feature selection, achieved an accuracy of 99.23%, sensitivity of 99%, specificity of 99%, and an AUC of 0.99. These results suggest that EEG-based machine learning can achieve high classification accuracy even with a relatively small dataset when combined with effective preprocessing and feature optimization techniques."
We have also summarized the performance of the Random Forest model after evolutionary search, which reached 93.91% accuracy.