Next Article in Journal
CORE-ReID V2: Advancing the Domain Adaptation for Object Re-Identification with Optimized Training and Ensemble Fusion
Previous Article in Journal
Students’ Burnout Symptoms Detection Using Smartwatch Wearable Devices: A Systematic Literature Review
 
 
aisens-logo
Article Menu

Article Menu

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Supporting ASD Diagnosis with EEG, ML and Swarm Intelligence: Early Detection of Autism Spectrum Disorder Based on Electroencephalography Analysis by Machine Learning and Swarm Intelligence

by
Flávio Secco Fonseca
1,2,†,
Adrielly Sayonara de Oliveira Silva
2,†,
Maria Vitória Soares Muniz
2,†,
Catarina Victória Nascimento de Oliveira
2,†,
Arthur Moreira Nogueira de Melo
2,†,
Maria Luísa Mendes de Siqueira Passos
3,†,
Ana Beatriz de Souza Sampaio
2,†,
Thailson Caetano Valdeci da Silva
2,†,
Alana Elza Fontes da Gama
2,†,
Ana Cristina de Albuquerque Montenegro
4,†,
Bianca Arruda Manchester de Queiroga
4,
Marilú Gomes Netto Monte da Silva
2,†,
Rafaella Asfora Siqueira Campos Lima
5,†,
Sadi da Silva Seabra Filho
6,†,
Shirley da Silva Jacinto de Oliveira Cruz
7,†,
Cecília Cordeiro da Silva
2,†,
Clarisse Lins de Lima
2,†,
Giselle Machado Magalhães Moreno
2,†,
Maíra Araújo de Santana
3,†,
Juliana Carneiro Gomes
2,† and
Wellington Pinheiro dos Santos
1,2,*
add Show full author list remove Hide full author list
1
Núcleo de Engenharia da Computação, Escola Politécnica de Pernambuco, Universidade de Pernambuco, Recife 50720-001, Brazil
2
Departamento de Engenharia Biomédica, Universidade Federal de Pernambuco, Recife 50740-550, Brazil
3
Centro de Informática, Universidade Federal de Pernambuco, Recife 50740-560, Brazil
4
Departamento de Fonoaudiologia, Universidade Federal de Pernambuco, Recife 50740-520, Brazil
5
Centro de Educação, Universidade Federal de Pernambuco, Recife 50670-901, Brazil
6
Departamento de Expressão Gráfica, Universidade Federal de Pernambuco, Recife 50740-550, Brazil
7
Hospital das Clínicas da Universidade Federal de Pernambuco, Empresa Brasileira de Serviços Hospitalares, Recife 50670-901, Brazil
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
AI Sens. 2025, 1(1), 3; https://doi.org/10.3390/aisens1010003
Submission received: 26 April 2025 / Revised: 5 June 2025 / Accepted: 13 June 2025 / Published: 24 June 2025

Abstract

Deficits in social interaction and communication characterize Autism Spectrum Disorder (ASD). Although widely recognized by its symptoms, diagnosing ASD remains challenging due to its wide range of clinical presentations. Methods: In this study, we propose a method to assist in the early diagnosis of autism, which is currently primarily based on clinical assessments. Our approach aims to develop an early differential diagnosis based on electroencephalogram (EEG) signals, seeking to identify patterns associated with ASD. In this study, we used EEG data from 56 participants obtained from the Sheffield dataset, including 28 individuals diagnosed with Autism Spectrum Conditions (ASC) and 28 neurotypical controls, applying numerical techniques to handle missing data. Subsequently, after a detailed analysis of the signals, we applied three different starting approaches: one with the original database and the other two with selection of the most significant attributes using the PSO and evolutionary search methods. In each of these approaches, we applied a series of machine learning models, where relatively high performances for classification were observed. Results: We achieved accuracies of 99.13% ± 0.44 for the dataset with original signals, 99.23% ± 0.38 for the dataset after applying PSO, and 93.91% ± 1.10 for the dataset after the evolutionary search methodology. These results were obtained using classical classifiers, with SVM being the most effective among the first two approaches, while Random Forest with 500 trees proved more efficient in the third approach. Conclusions: Even with all the limitations of the base, the results of the experiments demonstrated promising findings in identifying patterns associated with Autism Spectrum Disorder through the analysis of EEG signals. Finally, we emphasize that this work is the starting point for a larger project with the objective of supporting and democratizing the diagnosis of ASD both in children early and later in adults.

1. Introduction

1.1. Motivation and Problem Characterization

Over the past half century, our understanding of Autism Spectrum Disorder (ASD) has expanded significantly. ASD poses challenges in communication and social interaction, which is often accompanied by repetitive behaviors [1]. This increased knowledge has shed light on the complexities of autism. Now, we see it as a spectrum with many different signs [2]. So, we now know autism is different for everyone. The word ’autism’ is used in different ways to describe different things. It used to be part of a bigger group called ’Pervasive Developmental Disorders’ [3]. The effects of ASD last a long time for individuals and their families. It also affects the kind of help they receive in school and therapy. More people are being diagnosed with autism recently [4].
Until 2013, autism had five categories: Asperger’s, Rett Syndrome, Childhood Disintegrative Disorder, Autistic Disorder, and Pervasive Developmental Disorder (PDD). Today, we use the DSM-5 for diagnosing ASD. We no longer use subtypes. Now, there are three levels based on the support needs and abilities of autistic people, which can vary and evolve over time. While some people with autism can live independently, others have severe disabilities and require lifelong care and support [5,6,7]. With this revision, diagnostic criteria have also changed. We now focus on just two main areas: social communication and repetitive behavior. To diagnose ASD, we check for at least three social communication symptoms and at least two repetitive behavior symptoms [8]. In 2022, the WHO approved the ICD-11 for diagnosing ASD in the health system. (SUS). ICD-11 adopts the term “Autism Spectrum Disorder (ASD)” following DSM-5 standards. In the past, ICD-10 used ‘childhood autism’ under Pervasive Developmental Disorders [9].
Signs of ASD usually show up early: sometimes before age two. But, at first, it might look like a general delay in movement and thinking, making diagnosis hard early on [10]. Right now, diagnosing ASD is mainly clinical. Teams of doctors use observations and questionnaires to identify it [11,12]. However, diagnosis is often inaccurate and delayed. ASD does not have specific treatments, but there are therapies to help with symptoms. They aim to improve communication, social skills, and general life for the patient. We need better technologies to help diagnose ASD early, accurately, and easily.
Many studies used EEG signals to diagnose ASD by checking brain activity. These signals come from the small electrical currents in the human brain. These voltage changes happen because of the flow of ions between the brain and neurons [13]. EEG signals have different rhythms: delta, theta, alpha, and beta waves [14]. To understand brain activity, we record EEG signals from different parts of the scalp. We cover both sides of the brain. Electrodes in the International System are placed on the head to cover the entire scalp. The most commonly used models are the 10-20 and 10-10 patterns [15]. EEG signals have significant importance in the early diagnosis of diseases. In recent years, EEG signals have been crucial for diagnosing diseases early. In recent years, researchers have linked EEG use with various pattern identification strategies [16]. EEG diagnoses conditions like brain edemas, Parkinson’s, and epilepsy. It also controls brain–machine interfaces and detects and classifies emotions [17].
Furthermore, researchers have investigated machine learning techniques to build diagnostic support tools [11,12]. This branch of artificial intelligence has two parts: supervised and unsupervised machine learning. Some studies suggest that ASD diagnosis can use supervised machine learning techniques. These try to predict a variable, like the diagnosis [18,19]. In supervised learning, the machine learns patterns from a set of examples [20]. This method helps create complex applications and make accurate predictions on different data. Machine learning has three main categories: SVM, ANN, and DL [21]. According to experts, most ASD diagnostic models used ADTree and SVM algorithms. ADTree is a machine learning algorithm for classification. It has decision nodes with conditions and prediction nodes with numbers. The Support Vector Machine (SVM) aims to find a hyperplane in N-dimensional space. This hyperplane should clearly separate data points [22,23].
This study aims to propose a new tool for diagnosing Autism Spectrum Disorder. We analyze EEG signals labeled by clinical diagnosis using machine learning and statistics. We also highlight the potential for EEG utilization not only for diagnosis but also for therapeutic interventions as an area of interest within the larger project to which this work belongs. In this article, the Section 1.2 explores studies using machine learning for diagnosis. In the Section 2, we discuss the database utilized. We also explain key machine learning and statistical concepts. It is essential to grasp these concepts for the work. In addition, we also describe data preprocessing and experiment execution. The Results and Discussion are presented in Section 3 and Section 4 respectively, which are followed by the Conclusion in Section 5.

1.2. Related Works

This section presents studies that explore EEG and machine learning for ASD diagnosis, highlighting different techniques and classification models.
Recent studies have explored resting-state EEG in Autism Spectrum Disorder (ASD) diagnosis and characterization. Children with ASD show increased power spectral density in fast frequency bands, higher variability, and lower complexity compared to typically developing children [24]. A hybrid graph convolutional network model, Rest-HGCN, achieved high accuracy in ASD diagnosis using resting-state EEG signals [25]. Symptom-based clustering revealed distinct EEG functional connectivity patterns in mild and severe ASD subgroups with increased beta-band connectivity in mild ASD and decreased alpha-band connectivity in severe ASD [26]. A novel approach using time-series maps of brain functional connectivity and a combined CNN-LSTM model achieved classification accuracies of 81.08% and 74.55% for resting and task states, respectively [27]. These studies demonstrate the potential of resting-state EEG as a biomarker for ASD diagnosis and subtyping.
Tang et al. (2024) propose a novel hybrid graph convolutional network (Rest-HGCN) for diagnosing Autism Spectrum Disorder (ASD) using resting-state EEG signals [25]. The primary objective is to extract stable brain connectivity patterns through functional networks and leverage them for accurate classification. The methodology involves constructing brain functional graphs based on prior cognitive knowledge, which is followed by dual-branch graph convolutional layers that learn common and discriminative features. An attention mechanism is applied to integrate these features effectively. The model was evaluated on the ABC-CT dataset, comprising 399 participants (280 ASD and 119 controls), achieving an accuracy of 87.12% in single-subject experiments and 85.32% in cross-subject validation. Notably, the authors also conducted visualization and ablation studies to validate the robustness and interpretability of their approach.
While both studies aim at EEG-based ASD classification using machine learning, key differences exist in methodology and scale. The Rest-HGCN model focuses on graph-based feature extraction, modeling brain connectivity as a network and applying deep learning techniques to capture inter-regional interactions. In contrast, our work relies on handcrafted time-domain and frequency-domain features, which are combined with swarm intelligence (PSO) and evolutionary search for feature optimization. The Rest-HGCN uses a much larger dataset (n = 399), allowing for more generalizable conclusions, whereas our study is limited to n = 56 due to data availability. However, our approach achieves higher classification accuracy (99.23%) on a smaller sample, suggesting that even limited datasets can yield strong results when paired with effective feature selection and classical ML models like SVM. While Rest-HGCN excels in neurophysiological interpretability, our method prioritizes computational efficiency and model transparency, which may be preferable for real-world deployment in clinical settings.
Zhu et al. (2024) investigate heterogeneity among individuals with Autism Spectrum Disorder (ASD) by applying symptom-based clustering to identify distinct subgroups and analyzing their resting-state EEG functional connectivity patterns [26]. The authors used k-means clustering on clinical symptom scores (from ADOS and ADI-R assessments) to classify 80 children with ASD into two subgroups: one exhibiting higher social–communication deficits and lower repetitive behaviors and the other showing lower social deficits but higher repetitive behaviors. Functional connectivity was assessed using Pearson correlation matrices derived from 62-channel EEG data in multiple frequency bands. Results revealed divergent patterns between subgroups, particularly in the theta and alpha bands, suggesting that ASD subtypes may exhibit distinct neural mechanisms. This study contributes to the understanding of ASD heterogeneity and supports the move toward stratified biomarker research rather than treating ASD as a single entity.
While both Zhu et al. (2024) [26] and our work use EEG-based functional connectivity for insights into ASD, there are notable differences in scope and methodology. Zhu et al. (2024) [26] focus on intra-ASD subgroup differentiation, leveraging clinical symptom profiles to uncover neurophysiological variability. In contrast, our work addresses binary classification (ASD vs. control) using machine learning models and evolutionary feature selection techniques such as PSO. Their approach is more aligned with neuroscience discovery and stratification, while ours emphasizes diagnostic support via interpretable ML pipelines. Additionally, the Bhad413 dataset includes 80 children aged 6–14, whereas our dataset contains 56 adults aged 18–68, highlighting different developmental stages and applications. While they employed Pearson correlation and graph theory metrics, we extracted time-domain and spectral features followed by optimization to enhance model performance. Both studies reinforce the value of EEG as a tool for understanding and diagnosing ASD, but they differ in focus—ours on classification accuracy and practical deployment, theirs on subgroup characterization and neurobiological insight.
Xu et al. (2024) propose a deep learning framework for detecting Autism Spectrum Disorder (ASD) from EEG signals, combining convolutional and recurrent neural networks with an attention mechanism, along with data augmentation techniques to enhance model performance [27]. The authors used a dataset comprising resting-state and task-based EEG recordings from 60 participants (30 ASD and 30 controls), which were collected under two conditions: eyes-open resting state and visual stimulation task involving biomotion stimuli. A combination of CNN-LSTM architecture was employed to extract spatial and temporal features, while attention layers were added to emphasize discriminative regions in the EEG time-series. Data augmentation was performed through random cropping, noise injection, and signal permutation to mitigate overfitting due to limited sample size. The best-performing model achieved 75.68% accuracy on resting-state data and 69.09% on task-based data, outperforming baseline models such as VGG11 and LSTM alone.
Both Xu et al. (2024) [27] and our work aim at EEG-based classification of ASD using machine learning, but they differ significantly in methodology and objectives. The DL-ASD study leverages deep learning architectures (CNN-LSTM + attention) and emphasizes end-to-end feature extraction, whereas our approach is based on handcrafted time-domain and frequency-domain features followed by evolutionary search and PSO optimization before classification with classical ML models like SVM and Random Forest. While their method requires substantial computational resources and relies on automatic feature learning, our pipeline prioritizes model interpretability and computational efficiency, which may be more suitable for deployment in clinical settings with limited infrastructure. In terms of dataset size, both studies face limitations: theirs includes 60 participants, ours 56 participants, highlighting the ongoing challenge of accessing large-scale labeled EEG data in ASD research. However, despite similar constraints, our model achieves significantly higher accuracy (99.23%), suggesting that effective feature selection and optimization can compensate for smaller datasets when paired with appropriate classification strategies.
Kang et al. (2020) [28] combined EEG and eye-tracking data, achieving 85.44% accuracy and an AUC of 0.93, demonstrating the potential of multimodal approaches. Similarly, Mareeswaran and Selvarajan (2023) [29] explored AI tools for early ASD screening in infants, reaching 96% accuracy by categorizing data based on age, gender, and jaundice. Chung et al. (2024) [30] linked EEG patterns to Restricted and Repetitive Behaviors (RRBs) in infants, showing that EEG traits may help detect ASD-related behaviors early.
Deep learning approaches also showed promising results. Ali et al. (2020) [31] trained deep neural networks on EEG data, achieving 80% accuracy. Radhakrishnan et al. (2021) [32] explored deep convolutional networks with ResNet50 reaching 81% accuracy. Baygin et al. (2021) [33] used 1D LBP and STFT with MobileNetV2 and SqueezeNet, obtaining 96.44% accuracy. Meanwhile, Shu Lih Oh et al. (2021) [34] applied polynomial SVMs with degree 2 achieving 98.70% accuracy, surpassing other algorithms.
Other studies focused on feature extraction and classification techniques. Jayawardana et al. (2019) [35] used Frequency Band Decomposition and Wavelet Transform with Random Forest and CNN, achieving over 90% accuracy. Alotaibi and Maharatna (2021) [36] analyzed brain connectivity with Phase Locking Value (PLV) methods, reaching 95.8% accuracy. Abdolzadegan et al. (2020) [37] incorporated linear and non-linear EEG features, using KNN and SVM with DBSCAN artifact removal, leading to 94.68% accuracy.
Dede et al. (2023) [38] analyzed resting-state EEG data from 776 participants across multiple datasets, including the Sheffield dataset used in our study. Their work concluded that univariate biomarkers alone are insufficient for distinguishing ASD from controls. Our approach extends beyond univariate features by extracting multivariate patterns through machine learning models combined with feature selection techniques such as PSO and evolutionary search. These methods allow us to explore complex interactions between EEG features, potentially offering more sensitive and robust classification than traditional univariate approaches.
These studies reinforce the effectiveness of EEG-based machine learning for ASD detection with various models demonstrating high accuracy and potential for clinical applications.
Our work builds upon and extends several previous studies. Unlike Kang et al. (2020) [28], who focused on multimodal integration of EEG and eye-tracking data, we emphasize feature optimization via swarm intelligence and evolutionary search. Compared to deep learning approaches such as Baygin et al. (2021) [33], our methodology prioritizes interpretability and computational efficiency while maintaining high accuracy. Additionally, our combination of PSO and SVM contrasts with the DBSCAN-based preprocessing used by Abdolzadegan et al. (2020) [37], allowing us to extract compact yet effective feature sets suitable for deployment in real-world settings.

2. Materials and Methods

This section presents the materials and methods used in this study. We begin by detailing the selected database, which is followed by the preprocessing steps applied to the data. Next, we describe the classifiers used for training and testing as well as the evaluation metrics chosen to assess model performance.

2.1. Database

This study utilized Dataset 1 from the Sheffield database, which includes EEG signals from 56 adult participants aged 18 to 68 years. The dataset comprises two balanced groups: 28 individuals diagnosed with Autism Spectrum Conditions (ASC) and 28 neurotypical control participants. Data were recorded using the Biosemi Active Two EEG system for 150 s with participants at rest and under visual stimulation. A bandpass filter (0.01 to 140 Hz) was applied with Cz as the reference channel. EEG recordings primarily used a 64-sensor montage, while 128-sensor setups were adjusted for consistency [39,40]. Preprocessing with EEGLAB [41] included filtering, the removal of corrupted data, and downsampling to 512 Hz. Several challenges were noted in database analysis, including missing electrodes (Figure 1), which can disrupt data completeness and affect interpretation. Variability in electrode placement due to non-standardized positioning also influenced measurement accuracy. Additionally, high-density electrode systems, while improving resolution, increased data complexity, requiring significant computational resources for processing and analysis.
ASD is a spectrum disorder with significant heterogeneity among individuals. However, the dataset we used (Sheffield dataset) only provides binary labels: ASD or neurotypical control. But since the acquisition methodology employs EEG, it is probable that the autistic individuals are from support level 1.
It is important to acknowledge that the dataset used in this study contains only 56 participants, which may be considered small for robust machine learning model training. However, this limitation reflects the current availability of publicly accessible EEG datasets with clinical labels for ASD. Several recent studies have also employed similarly sized samples due to limited access to high-quality, labeled EEG data [13,35,37]. To mitigate overfitting and improve reliability, we applied cross-validation techniques (10-fold × 30 repetitions). Furthermore, this work serves as an exploratory and methodological foundation, aiming to validate the proposed pipeline before applying it to larger, multi-site datasets in future research.

2.2. Preprocessing

The data preprocessing stage followed the flow depicted in Figure 2.
The original files in .set and .fdt formats were converted to .edf, which is a widely used format for storing biomedical signals such as EEG and PSG due to its ability to handle multiple channels of temporal data [42].
To facilitate analysis, the .edf files were converted to .csv using Python 3.11.1 [43], making them compatible with various processing tools. Missing electrode data were addressed using Inverse Distance Weighting (IDW) interpolation, which was modified with a negative exponential function to reduce computational cost and avoid division by zero. The 128-electrode system was visually compared to the 64-electrode system, and missing channels were identified. Empty columns were added in the correct order, ensuring alignment with the electrode layout shown in Figure 3, based on the ASD 113 file, which was the only one with complete electrode signals.
The software GIMP (version 2.10.34) [44] was used to obtain the coordinates needed for distance calculations. Using this, the distance between each point was determined using Equation (1), which is based on the Pythagorean theorem [45]. In the formula, “a” and “b” are the points, “X1” and “Y1” are the coordinates of point “a”, and “X2” and “Y2” are the coordinates of point “b”. Equation (2) below was used to fill in the missing channels:
d a b = ( X 2 X 1 ) 2 + ( Y 2 Y 1 ) 2
p j = i = 1 m p i x e d i j i = 1 m e d i j
The missing channel “Pj” represents the final interpolated signal, while “Pi” corresponds to the existing non-missing channels, ensuring that i j . Once all missing data were processed, a complete dataset was generated and converted into a CSV file.
For further analysis, GNU Octave (version 8.2.0) [46] was used to segment the EEG signal and extract attributes. A total of 34 features were obtained from each window, as listed in Figure 4. Prior studies confirm their effectiveness in EEG analysis [47] and audio analysis [48,49,50]. These features include statistical measures (mean, variance, standard deviation), time-domain metrics (waveform length, zero crossing, Hjorth parameters), and frequency-based attributes (mean power, peak frequency, Shannon entropy). The signal was windowed into 2 s segments with 0.5 s overlap at a sampling rate of 512 Hz. Finally, the most relevant attributes were selected using Particle Swarm Optimization (PSO) and evolutionary search methods [51]. Thus, three final files were generated: the original dataset, the optimized dataset using PSO, and the optimized dataset using evolutionary search, which were all properly processed.

2.3. Classification

We acquired three distinct databases using the methods described before. Then, we trained, validated, and tested various classifiers. This process is shown in the diagram in Figure 5. We used the following algorithms, which were chosen according to previous methodologies to represent distribution-based and decision tree-based classifiers [48,49,50,52,53,54,55,56,57,58]: Bayes Net [59], Naive Bayes [60], Random Tree [61], and Random Forest [62]. The Random Forest had 10, 100, and 500 trees. We also used the Support Vector Machine [63] with various kernels. These were Linear, Polynomial of Degree 2, and Polynomial of Degree 3. We also tried Rbf with Gamma 0.1, 0.2, and 0.3.
We used the Waikato Environment for Knowledge Analysis (WEKA) software, version 3.9.6 [64,65], to split each dataset into 80% training and 20% testing sets. The selected models and configurations were then applied. To ensure statistical robustness, each experiment was repeated 30 times in addition to performing 10-fold cross-validation.
In this method, the dataset is divided into 10 equal parts with the model trained on 9 parts and tested on the remaining one. This process repeats until all parts serve as a test set once, averaging the performance metrics across iterations. This technique enhances model reliability, reducing variance and bias while providing a more accurate estimate of generalization ability [66].

3. Results

To assess the statistical power of our classification task with n = 56 (28 per group), we performed a power analysis. Assuming a medium effect size (Cohen’s d = 0.5), alpha = 0.05 (two-tailed), the achieved power was approximately 78%. This level of power is generally considered acceptable for pilot or exploratory studies. While not ideal, this indicates that the sample size provides reasonable confidence in detecting moderate differences between groups.
This section presents and analyzes the research results. We first examined the complete dataset as a baseline, which was followed by evaluating the Particle Swarm Optimization (PSO) and evolutionary search methods. Tests were conducted on the best models from each dataset to determine the most effective techniques for future studies.
Table 1 displays the results using the full dataset, including all channels and features. The accuracy difference between the best and worst models reached 40.55%, highlighting the variation in performance.
The results of the training and validation stage after feature selection with PSO are presented in Table 2. In this approach, the interval between the average accuracies of the best and worst models was 39.74%.
The results of the evolutionary search-based feature selection are presented in Table 3. In this approach, the accuracy gap between the best and worst models was 38.18%.
Figure 6 and Figure 7 display boxplots illustrating the dispersion of accuracy and kappa index values for the best model in each approach. Some dispersion patterns and outliers were identified, which will be analyzed in the next section.
The best models in each approach (complete, PSO, and evolutionary search) were finally tested. They were tested with separate samples from the dataset in a single round. Table 4 shows the results of this test. The same testing stage generated the confusion matrices for each classifier. These are in Figure 8. We can use them to analyze the true and false positive and negative rates from each model. This analysis will be conducted in the Section 3 and Section 4.

4. Discussion

From Table 1, Naive Bayes had the weakest performance with an accuracy of 58.58%, a kappa index of 0.17, sensitivity of 0.25, and AUC of 0.60. Its specificity (0.92) was an exception but indicated near-random ASD classification. Bayes Net performed slightly better with 74.2% accuracy and improved sensitivity.
Tree-based models performed significantly better with Decision Trees and Random Forests achieving over 90% in accuracy, sensitivity, and specificity. The Decision Tree had a lower kappa index (80%), suggesting weaker agreement between training and validation. Random Forest with 100 and 500 trees excelled, reaching over 98% accuracy with well-balanced sensitivity and specificity.
Among the Support Vector Machines (SVMs), polynomial kernels (degrees 1, 2, and 3) reached 96%, 97%, and 98% accuracy, maintaining balanced true positive and true negative rates. The best-performing model was the SVM with an RBF kernel (degree 0.1), achieving 99.13% accuracy with a standard deviation of 0.44, highlighting potential refinements for classifier optimization.
From Table 2, after applying PSO, the dataset’s dimensionality significantly decreased from 2142 features (34 extracted from each of the 63 electrodes) to 686. This reduction improved processing efficiency while maintaining classifier effectiveness with performance levels similar to the previous approach.
Naive Bayes remained the weakest model in training, though it showed slight improvements across key metrics. Sensitivity for ASD classification remained its main limitation. Bayes Net and Random Tree also continued to rank among the lowest-performing models.
Among the classifiers, the SVM with an RBF kernel and gamma of 0.2 stood out, achieving 99.23% accuracy, a kappa index of 0.98, and 99% across sensitivity, specificity, and AUC. Despite the significant reduction in features, these metrics remained close to 100%. Other SVM models, except those with a linear polynomial kernel, delivered results similar to the best-performing classifier.
The configurations of Random Forest, both the simplest one with 10 trees and the one with 500 trees, also achieved metrics above 90%. As models that require less computational processing, they become a viable alternative to the solution.
Finally, an evolutionary search was applied to select the most relevant features, reducing the dataset from 2142 to 20 key features. Unlike previous approaches where SVM with an RBF kernel performed best, this method yielded the best results with Random Forest. As shown in Table 3, both the 100-tree and 500-tree configurations achieved over 93% accuracy, sensitivity, and specificity, along with a kappa index of 0.87 and an AUC of 0.98.
Performance among other classifiers varied significantly. Naive Bayes had the weakest results with 55.73% accuracy, a kappa index of 0.11, and low sensitivity (0.18) but high specificity (0.92), indicating poor suitability for this dataset. Bayes Net performed similarly to SVM but showed weaker results compared to its performance in previous approaches with 68.61% accuracy. SVM models displayed inconsistency, with the polynomial kernel (degree 3) performing best at 72.59% accuracy and a kappa index of 0.45, but it was still below Random Forest. Random Tree showed strong performance, reaching 87.43% accuracy and kappa values above 0.70. In the end, Random Forest with 500 trees was selected as the best classifier, reinforcing its effectiveness for this classification problem.
Figure 6 presents the boxplots of the best models from each approach. The Random Forest with 500 trees showed a wider range of values compared to SVM models, with greater variability in the evolutionary search-based dataset, despite having the lowest median. In contrast, the SVM models in the original and PSO-based datasets had similar distributions, with values closer to their medians, suggesting greater precision.
Figure 7 illustrates the dispersion of kappa index values. The SVM models exhibited more consistent classification agreement with a smaller spread between maximum and minimum values. The Random Forest with 500 trees in the evolutionary search approach had more dispersed values, with 75% of observations below 0.9, and even its highest values were lower than the worst SVM results in the other approaches.
The test results in Table 4 show that all three models performed well with similar outcomes. However, the SVM models for the complete and PSO-reduced datasets had slight decreases across all five metrics. The Random Forest with the evolutionary search dataset performed best in testing, which was expected due to lower statistical rigor at this stage. The top-performing classifier in this single test round was the SVM with RBF kernel (degree 0.2) after PSO, reinforcing the effectiveness of these models when applied to new data.
Figure 8a shows that the SVM with the complete dataset correctly classified 562 ASD windows and 561 control windows, maintaining high sensitivity and specificity. In Figure 8b, the SVM after PSO correctly identified 570 ASD and 576 control windows. Lastly, Figure 8c presents the Random Forest with evolutionary search, classifying 569 ASD windows correctly, with 26 false positives and 562 control windows correctly, with 28 misclassifications.
The superior performance of SVM with RBF kernel and Random Forest suggests that non-linear relationships in EEG data play a key role in differentiating ASD from controls. The effectiveness of PSO-based feature selection highlights the importance of optimizing input features, reducing redundancy while preserving discriminative information. Meanwhile, the evolutionary search approach demonstrated that even fewer features (20 out of 2142) can still yield strong classification accuracy (about 93%), suggesting potential for lightweight, deployable models. These findings reinforce the viability of combining traditional ML algorithms with intelligent search strategies for efficient and accurate ASD classification.

5. Conclusions

This study proposed a machine learning-based method for diagnosing Autism Spectrum Disorder (ASD) using EEG signals. By applying interpolation to estimate missing signals and using Particle Swarm Optimization for feature selection, the approach demonstrated strong classification performance with Random Forest and Support Vector Machine (SVM) models. Our best-performing model, the SVM with RBF kernel after PSO-based feature selection, achieved an accuracy of 99.23%, sensitivity of 99%, specificity of 99%, and an AUC of 0.99. These results suggest that EEG-based machine learning can achieve high classification accuracy even with a relatively small dataset when combined with effective preprocessing and feature optimization techniques.
Despite its strengths, some limitations must be considered, such as variability in EEG recording protocols and the need for validation with independent samples. Using a public dataset may also introduce biases, affecting result generalization. Addressing these challenges is crucial to refining the method and enhancing its clinical applicability.
Future work includes developing a proprietary EEG database with standardized data collection and designing software for automated EEG analysis and diagnostic support. This will optimize healthcare workflows and contribute to broader access to ASD diagnosis. By improving diagnostic methods, this study can help facilitate early interventions, enhance patient outcomes, and reduce healthcare costs, representing a significant step toward more efficient and accessible ASD detection.

Author Contributions

Conceptualization: F.S.F., G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Methodology: F.S.F., G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Software: F.S.F., A.S.d.O.S., M.V.S.M., C.V.N.d.O., A.M.N.d.M., M.L.M.d.S.P., A.B.d.S.S., T.C.V.d.S., G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Validation: F.S.F., A.S.d.O.S., M.V.S.M., C.V.N.d.O., A.M.N.d.M., M.L.M.d.S.P., A.B.d.S.S., T.C.V.d.S., G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Formal analysis: F.S.F., G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Investigation: F.S.F., A.S.d.O.S., M.V.S.M., C.V.N.d.O., A.M.N.d.M., M.L.M.d.S.P., A.B.d.S.S., T.C.V.d.S., C.C.d.S., C.L.d.L., G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Resources: F.S.F., G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Data curation: F.S.F., G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Writing—original draft preparation: F.S.F., A.S.d.O.S., M.V.S.M., C.V.N.d.O., A.M.N.d.M., M.L.M.d.S.P., A.B.d.S.S., T.C.V.d.S., G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Writing—review and editing: F.S.F., C.C.d.S., C.L.d.L., G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Visualization: F.S.F., A.S.d.O.S., M.V.S.M., C.V.N.d.O., A.M.N.d.M., M.L.M.d.S.P., A.B.d.S.S., T.C.V.d.S., C.C.d.S., C.L.d.L., G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Supervision: G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Data curation: G.M.M.M., M.A.d.S., J.C.G. and W.P.d.S.; Project administration: J.C.G. and W.P.d.S.; Funding acquisition: A.E.F.d.G., A.C.d.A.M., B.A.M.d.Q., M.G.N.M.d.S., R.A.S.C.L., S.d.S.S.F., S.d.S.J.d.O.C. and W.P.d.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Brazilian research agencies Conselho Nacional de Desenvolvimento Científico e Tecnológico—CNPq (CNPq 304636/2021-5), Fundação de Amparo à Ciência e Tecnologia do Estado de Pernambuco—FACEPE (IBPG-2267-3.13/22), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—CAPES, and Financiadora de Estudos e Projetos—FINEP (2170/22).

Data Availability Statement

This study utilized Dataset 1 from the Sheffield public database, containing EEG signals from 56 participants aged 18 to 68. Data were recorded using the Biosemi Active Two EEG system for 150 s with participants at rest and under visual stimulation. A bandpass filter (0.01 to 140 Hz) was applied with Cz as the reference channel. EEG recordings primarily used a 64-sensor montage, while 128-sensor setups were adjusted for consistency. The code developed by the authors could be available under demand.

Acknowledgments

We would like to thank the Brazilian research agencies Conselho Nacional de Desenvolvimento Científico e Tecnológico—CNPq (CNPq 304636/2021-5), Fundação de Amparo à Ciência e Tecnologia do Estado de Pernambuco—FACEPE (IBPG-2267-3.13/22), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—CAPES, and Financiadora de Estudos e Projetos—FINEP (2170/22), for partially funding this research.

Conflicts of Interest

The authors declare no competing interests.

References

  1. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders: DSM-5, 5th ed.; American Psychiatric Association Publishing: Washington, DC, USA, 2013. [Google Scholar]
  2. Lord, C.; Elsabbagh, M.; Baird, G.; Veenstra-Vanderweele, J. Autism spectrum disorder. Lancet 2018, 392, 508–520. [Google Scholar] [CrossRef] [PubMed]
  3. Lord, C.; Brugha, T.S.; Charman, T.; Cusack, J.; Dumas, G.; Frazier, T.; Jones, E.J.; Jones, R.M.; Pickles, A. Autism spectrum disorder. Nat. Rev. Dis. Prim. 2020, 6, 5. [Google Scholar] [CrossRef] [PubMed]
  4. Vismara, L.A.; Rogers, S.J. Behavioral treatments in autism spectrum disorder: What do we know? Annu. Rev. Clin. Psychol. 2010, 6, 447–468. [Google Scholar] [CrossRef] [PubMed]
  5. Kulage, K.M.; Smaldone, A.M.; Cohn, E.G. How will DSM-5 affect autism diagnosis? A systematic literature review and meta-analysis. J. Autism Dev. Disord. 2014, 44, 1918–1932. [Google Scholar] [CrossRef]
  6. Volkmar, F.R.; Reichow, B. Autism in DSM-5: Progress and challenges. Mol. Autism 2013, 4, 1–6. [Google Scholar] [CrossRef]
  7. Kulage, K.M.; Goldberg, J.; Usseglio, J.; Romero, D.; Bain, J.M.; Smaldone, A.M. How has DSM-5 affected autism diagnosis? A 5-year follow-up systematic literature review and meta-analysis. J. Autism Dev. Disord. 2020, 50, 2102–2127. [Google Scholar] [CrossRef]
  8. Sharma, S.R.; Gonda, X.; Tarazi, F.I. Autism spectrum disorder: Classification, diagnosis and therapy. Pharmacol. Ther. 2018, 190, 91–104. [Google Scholar] [CrossRef]
  9. Dias, C.C.V.; Maciel, S.C.; Silva, J.V.C.d.; Menezes, T.d.S.B.d. Representações sociais sobre o autismo elaboradas por estudantes universitários. Psico-USF 2022, 26, 631–643. [Google Scholar] [CrossRef]
  10. Oliveira, G. Autismo: Diagnóstico e orientação. Parte I-Vigilância, rastreio e orientação nos cuidados primários de saúde. Acta Pediatr. Port. 2009, 40, 278–287. [Google Scholar]
  11. Heinsfeld, A.S.; Franco, A.R.; Craddock, R.C.; Buchweitz, A.; Meneguzzi, F. Identification of autism spectrum disorder using deep learning and the ABIDE dataset. NeuroImage Clin. 2018, 17, 16–23. [Google Scholar] [CrossRef]
  12. Khodatars, M.; Shoeibi, A.; Sadeghi, D.; Ghaasemi, N.; Jafari, M.; Moridian, P.; Khadem, A.; Alizadehsani, R.; Zare, A.; Kong, Y.; et al. Deep learning for neuroimaging-based diagnosis and rehabilitation of autism spectrum disorder: A review. Comput. Biol. Med. 2021, 139, 104949. [Google Scholar] [CrossRef] [PubMed]
  13. Kaur, J.; Kaur, A. A review on analysis of EEG signals. In Proceedings of the 2015 International Conference on Advances in Computer Engineering and Applications, Ghaziabad, India, 19–20 March 2015; pp. 957–960. [Google Scholar]
  14. Subha, D.P.; Joseph, P.K.; Acharya, U.R.; Lim, C.M. EEG signal analysis: A survey. J. Med. Syst. 2010, 34, 195–212. [Google Scholar] [CrossRef] [PubMed]
  15. Tavares, M.C.; Eng, M.; Biomédica, E. Eeg e potenciais evocados–uma introduçao. Contronic Sist. Automát. Ltda 2011, 2011, 1–13. [Google Scholar]
  16. Cantarelli, T.L.; Júnior, J.; Júnior, S. Fundamentos da medição do eeg: Uma introdução. Semin. ELETRONICA E AUTOMAÇÃO, Ponta Grossa. 2016. Available online: https://www.researchgate.net/profile/Jose-Mendes-Junior-2/publication/308400572_FUNDAMENTOS_DA_MEDICAO_DO_EEG_UMA_INTRODUCAO/links/57e2c45d08aecd0198dd808b/FUNDAMENTOS-DA-MEDICAO-DO-EEG-UMA-INTRODUCAO.pdf (accessed on 26 April 2025).
  17. Li, B.; Cheng, T.; Guo, Z. A review of EEG acquisition, processing and application. J. Phys. Conf. Ser. 2021, 1907, 012045. [Google Scholar] [CrossRef]
  18. Oro, A.B.; Navarro-Calvillo, M.; Esmer, C. Autistic Behavior Checklist (ABC) and its applications. In Comprehensive Guide to Autism; Springer: New York, NY, USA, 2014; pp. 2787–2798. [Google Scholar]
  19. Pagnozzi, A.M.; Conti, E.; Calderoni, S.; Fripp, J.; Rose, S.E. A systematic review of structural MRI biomarkers in autism spectrum disorder: A machine learning perspective. Int. J. Dev. Neurosci. 2018, 71, 68–82. [Google Scholar] [CrossRef]
  20. De Bruijne, M. Machine learning approaches in medical image analysis: From detection to diagnosis. Med. Image Anal. 2016, 33, 94–97. [Google Scholar] [CrossRef]
  21. Tanveer, M.; Richhariya, B.; Khan, R.U.; Rashid, A.H.; Khanna, P.; Prasad, M.; Lin, C. Machine learning techniques for the diagnosis of Alzheimer’s disease: A review. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2020, 16, 1–35. [Google Scholar] [CrossRef]
  22. Hyde, K.K.; Novack, M.N.; LaHaye, N.; Parlett-Pelleriti, C.; Anden, R.; Dixon, D.R.; Linstead, E. Applications of supervised machine learning in autism spectrum disorder research: A review. Rev. J. Autism Dev. Disord. 2019, 6, 128–146. [Google Scholar] [CrossRef]
  23. Sandin, S.; Lichtenstein, P.; Kuja-Halkola, R.; Hultman, C.; Larsson, H.; Reichenberg, A. The heritability of autism spectrum disorder. JAMA 2017, 318, 1182–1184. [Google Scholar] [CrossRef]
  24. Angulo-Ruiz, B.Y.; Ruiz-Martínez, F.J.; Rodríguez-Martínez, E.I.; Ionescu, A.; Saldaña, D.; Gómez, C.M. Linear and non-linear analyses of EEG in a group of ASD children during resting state condition. Brain Topogr. 2023, 36, 736–749. [Google Scholar] [CrossRef]
  25. Tang, T.; Li, C.; Zhang, S.; Chen, Z.; Yang, L.; Mu, Y.; Chen, J.; Xu, P.; Gao, D.; Li, F.; et al. A hybrid graph network model for ASD diagnosis based on resting-state EEG signals. Brain Res. Bull. 2024, 206, 110826. [Google Scholar] [CrossRef] [PubMed]
  26. Zhu, G.; Li, Y.; Wan, L.; Sun, C.; Liu, X.; Zhang, J.; Liang, Y.; Liu, G.; Yan, H.; Li, R.; et al. Divergent electroencephalogram resting-state functional network alterations in subgroups of autism spectrum disorder: A symptom-based clustering analysis. Cereb. Cortex 2024, 34, bhad413. [Google Scholar] [CrossRef] [PubMed]
  27. Xu, Y.; Yu, Z.; Li, Y.; Liu, Y.; Li, Y.; Wang, Y. Autism spectrum disorder diagnosis with EEG signals using time series maps of brain functional connectivity and a combined CNN–LSTM model. Comput. Methods Programs Biomed. 2024, 250, 108196. [Google Scholar] [CrossRef] [PubMed]
  28. Kang, J.; Han, X.; Song, J.; Niu, Z.; Li, X. The identification of children with autism spectrum disorder by SVM approach on EEG and eye-tracking data. Comput. Biol. Med. 2020, 120, 103722. [Google Scholar] [CrossRef]
  29. Hossain, M.D.; Kabir, M.A.; Anwar, A.; Islam, M.Z. Detecting autism spectrum disorder using machine learning techniques: An experimental analysis on toddler, child, adolescent and adult datasets. Health Inf. Sci. Syst. 2021, 9, 1–13. [Google Scholar] [CrossRef]
  30. Chung, H.; Wilkinson, C.; Said, A.; Nelson, C. Evaluating early EEG correlates of restricted and repetitive behaviors for toddlers with or without autism. Res. Sq. 2024, 2024, rs-3. [Google Scholar] [CrossRef]
  31. Ali, N.A.; Syafeeza, A.; Jaafar, A.; Alif, M.; Ali, N. Autism spectrum disorder classification on electroencephalogram signal using deep learning algorithm. IAES Int. J. Artif. Intell. 2020, 9, 91–99. [Google Scholar] [CrossRef]
  32. Radhakrishnan, M.; Ramamurthy, K.; Choudhury, K.K.; Won, D.; Manoharan, T.A. Performance analysis of deep learning models for detection of autism spectrum disorder from EEG signals. Trait. Signal 2021, 38, 853–863. [Google Scholar] [CrossRef]
  33. Baygin, M.; Dogan, S.; Tuncer, T.; Barua, P.D.; Faust, O.; Arunkumar, N.; Abdulhay, E.W.; Palmer, E.E.; Acharya, U.R. Automated ASD detection using hybrid deep lightweight features extracted from EEG signals. Comput. Biol. Med. 2021, 134, 104548. [Google Scholar] [CrossRef]
  34. Oh, S.L.; Jahmunah, V.; Arunkumar, N.; Abdulhay, E.W.; Gururajan, R.; Adib, N.; Ciaccio, E.J.; Cheong, K.H.; Acharya, U.R. A novel automated autism spectrum disorder detection system. Complex Intell. Syst. 2021, 7, 2399–2413. [Google Scholar] [CrossRef]
  35. Jayawardana, Y.; Jaime, M.; Jayarathna, S. Analysis of Temporal Relationships between ASD and Brain Activity through EEG and Machine Learning. In Proceedings of the 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), Los Angeles, CA, USA, 30 July–1 August 2019; pp. 151–158. [Google Scholar] [CrossRef]
  36. Alotaibi, N.; Maharatna, K. Classification of autism spectrum disorder from EEG-based functional brain connectivity analysis. Neural Comput. 2021, 33, 1914–1941. [Google Scholar] [CrossRef] [PubMed]
  37. Abdolzadegan, D.; Moattar, M.H.; Ghoshuni, M. A robust method for early diagnosis of autism spectrum disorder from EEG signals based on feature selection and DBSCAN method. Biocybern. Biomed. Eng. 2020, 40, 482–493. [Google Scholar] [CrossRef]
  38. Dede, A.J.O.; Xiao, W.; Vaci, N.; Cohen, M.X.; Milne, E. Lack of univariate, clinically-relevant biomarkers of autism in resting state EEG: A study of 776 participants. MedRxiv 2023. [Google Scholar] [CrossRef]
  39. Dickinson, A.; Jeste, S.; Milne, E. Electrophysiological signatures of brain aging in autism spectrum disorder. Cortex 2022, 148, 139–151. [Google Scholar] [CrossRef]
  40. Oostenveld, R.; Praamstra, P. The five percent electrode system for high-resolution EEG and ERP measurements. Clin. Neurophysiol. 2001, 112, 713–719. [Google Scholar] [CrossRef]
  41. Delorme, A.; Makeig, S. EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 2004, 134, 9–21. [Google Scholar] [CrossRef]
  42. Kemp, B.; Olivan, J. European data format ‘plus’ (EDF+), an EDF alike standard format for the exchange of physiological data. Clin. Neurophysiol. 2003, 114, 1755–1761. [Google Scholar] [CrossRef]
  43. Van Rossum, G.; Drake, F.L. Python 3 Reference Manual; CreateSpace: Scotts Valley, CA, USA, 2009. [Google Scholar]
  44. Oliver, E.; Ruiz, J.; She, S.; Wang, J. The Software Architecture of the GIMP 2006. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=51e9312a8c6376dd0b87e102422acff4c7209d19 (accessed on 5 June 2025).
  45. Maor, E. The Pythagorean Theorem: A 4,000-Year History. In Statistics and Computing; Princeton University Press: Princeton, NJ, USA, 2019. [Google Scholar] [CrossRef]
  46. Eaton, J.W.; Bateman, D.; Hauberg, S.; Wehbring, R. GNU Octave Version 10.1.0 Manual: A High-Level Interactive Language for Numerical Computations; 2025. Available online: https://docs.octave.org/latest/ (accessed on 5 June 2025).
  47. de Santana, M.A.; Fonseca, F.S.; Torcate, A.S.; dos Santos, W.P. Emotion Recognition from Multimodal Data: A machine learning approach combining classical and hybrid deep architectures. Res. Biomed. Eng. 2023, 39, 613–639. [Google Scholar] [CrossRef]
  48. Espinola, C.W.; Gomes, J.C.; Pereira, J.M.S.; Dos Santos, W.P. Detection of major depressive disorder using vocal acoustic analysis and machine learning—An exploratory study. Res. Biomed. Eng. 2021, 37, 53–64. [Google Scholar] [CrossRef]
  49. Wanderley Espinola, C.; Gomes, J.C.; Mônica Silva Pereira, J.; dos Santos, W.P. Detection of major depressive disorder, bipolar disorder, schizophrenia and generalized anxiety disorder using vocal acoustic analysis and machine learning: An exploratory study. Res. Biomed. Eng. 2022, 38, 813–829. [Google Scholar] [CrossRef]
  50. Espinola, C.W.; Gomes, J.C.; Pereira, J.M.S.; dos Santos, W.P. Vocal acoustic analysis and machine learning for the identification of schizophrenia. Res. Biomed. Eng. 2021, 37, 33–46. [Google Scholar] [CrossRef]
  51. Jiang, J.J.; Wei, W.X.; Shao, W.L.; Liang, Y.F.; Qu, Y.Y. Research on large-scale bi-level particle swarm optimization algorithm. IEEE Access 2021, 9, 56364–56375. [Google Scholar] [CrossRef]
  52. Santana, M.A.d.; Pereira, J.M.S.; Silva, F.L.d.; Lima, N.M.d.; Sousa, F.N.d.; Arruda, G.M.S.d.; Lima, R.d.C.F.d.; Silva, W.W.A.d.; Santos, W.P.d. Breast cancer diagnosis based on mammary thermography and extreme learning machines. Res. Biomed. Eng. 2018, 34, 45–53. [Google Scholar] [CrossRef]
  53. Barbosa, V.A.d.F.; Gomes, J.C.; de Santana, M.A.; Albuquerque, J.E.d.A.; de Souza, R.G.; de Souza, R.E.; dos Santos, W.P. Heg.IA: An intelligent system to support diagnosis of Covid-19 based on blood tests. Res. Biomed. Eng. 2021, 2021, 1–18. [Google Scholar]
  54. Gomes, J.C.; Barbosa, V.A.d.F.; Santana, M.A.; Bandeira, J.; Valença, M.J.S.; de Souza, R.E.; Ismael, A.M.; dos Santos, W.P. IKONOS: An intelligent tool to support diagnosis of COVID-19 by texture analysis of X-ray images. Res. Biomed. Eng. 2020, 2020, 1–14. [Google Scholar] [CrossRef]
  55. Oliveira, A.P.S.d.; De Santana, M.A.; Andrade, M.K.S.; Gomes, J.C.; Rodrigues, M.C.; dos Santos, W.P. Early diagnosis of Parkinson’s disease using EEG, machine learning and partial directed coherence. Res. Biomed. Eng. 2020, 36, 311–331. [Google Scholar] [CrossRef]
  56. Gomes, J.C.; Masood, A.I.; Silva, L.H.d.S.; da Cruz Ferreira, J.R.B.; Freire Junior, A.A.; Rocha, A.L.d.S.; de Oliveira, L.C.P.; da Silva, N.R.C.; Fernandes, B.J.T.; Dos Santos, W.P. Covid-19 diagnosis by combining RT-PCR and pseudo-convolutional machines to characterize virus sequences. Sci. Rep. 2021, 11, 11545. [Google Scholar] [CrossRef]
  57. Rodrigues, A.L.; de Santana, M.A.; Azevedo, W.W.; Bezerra, R.S.; Barbosa, V.A.; de Lima, R.C.; dos Santos, W.P. Identification of mammary lesions in thermographic images: Feature selection study using genetic algorithms and particle swarm optimization. Res. Biomed. Eng. 2019, 35, 213–222. [Google Scholar] [CrossRef]
  58. Barbosa, V.A.d.F.; Gomes, J.C.; de Santana, M.A.; de Lima, C.L.; Calado, R.B.; Bertoldo Junior, C.R.; Albuquerque, J.E.d.A.; de Souza, R.G.; de Araújo, R.J.E.; Mattos Junior, L.A.R.; et al. Covid-19 rapid test by combining a random forest-based web system and blood tests. J. Biomol. Struct. Dyn. 2022, 40, 11948–11967. [Google Scholar] [CrossRef]
  59. Zhao, P.; Zhang, L.; Liu, G.; Si, S. Design and Development of the Bayesian Network Platform Based on B/S Structure. In Proceedings of the 2011 Fourth International Symposium on Knowledge Acquisition and Modeling, Sanya, China, 8–9 October 2011; pp. 65–68. [Google Scholar]
  60. Dimitoglou, G.; Adams, J.A.; Jim, C.M. Comparison of the C4. 5 and a Naïve Bayes classifier for the prediction of lung cancer survivability. arXiv 2012, arXiv:1206.1121. [Google Scholar]
  61. Niranjan, A.; Nutan, D.; Nitish, A.; Deepa-Shenoy, P.; Venugopal, K.; ERCR, T. Ensemble of random committee and random tree for efficient anomaly classification using voting. In Proceedings of the International Conference for Convergence in Technology, Pune, India, 6–8 April 2018; pp. 6–8. [Google Scholar]
  62. Yuan, D.; Huang, J.; Yang, X.; Cui, J. Improved random forest classification approach based on hybrid clustering selection. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; pp. 1559–1563. [Google Scholar]
  63. Burbidge, R.; Trotter, M.; Buxton, B.; Holden, S. Drug design by machine learning: Support vector machines for pharmaceutical data analysis. Comput. Chem. 2001, 26, 5–14. [Google Scholar] [CrossRef] [PubMed]
  64. Eibe, F.; Hall, M.A.; Witten, I.H.; Pal, J. The WEKA workbench. In Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques; Morgan Kauffmann: Cambridge, UK, 2016; Volume 4. [Google Scholar]
  65. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: An update. ACM Sigkdd Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
  66. Fushiki, T. Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 2011, 21, 137–146. [Google Scholar] [CrossRef]
Figure 1. Bar chart depicting the distribution of EEG files according to the quantity of missing data points. Each bar corresponds to the count of EEG files exhibiting specific numbers of absent channels, illustrating the prevalence and extent of incomplete data within the dataset analyzed.
Figure 1. Bar chart depicting the distribution of EEG files according to the quantity of missing data points. Each bar corresponds to the count of EEG files exhibiting specific numbers of absent channels, illustrating the prevalence and extent of incomplete data within the dataset analyzed.
Aisens 01 00003 g001
Figure 2. Pre-processing pipeline for EEG signal analysis from the Sheffield database. The diagram details the workflow from initial EEG data acquisition at 512 Hz through the steps of file conversion, management of missing data via augmentation strategies, and feature extraction and the selection of 34 key attributes. The processed data are then partitioned into distinct datasets: original, optimized through Particle Swarm Optimization (PSO), and using evolutionary search techniques. Each dataset is split into training/validation and testing subsets, with an 80:20 ratio, to prepare for the machine learning analysis phase.
Figure 2. Pre-processing pipeline for EEG signal analysis from the Sheffield database. The diagram details the workflow from initial EEG data acquisition at 512 Hz through the steps of file conversion, management of missing data via augmentation strategies, and feature extraction and the selection of 34 key attributes. The processed data are then partitioned into distinct datasets: original, optimized through Particle Swarm Optimization (PSO), and using evolutionary search techniques. Each dataset is split into training/validation and testing subsets, with an 80:20 ratio, to prepare for the machine learning analysis phase.
Aisens 01 00003 g002
Figure 3. Standardized electrode placement map for EEG recording. The image displays the electrode positions used during EEG data collection as configured in EEGLAB for the ASD 113 dataset. This particular file is unique as it includes complete data from all electrode placements, providing a comprehensive framework for EEG analysis.
Figure 3. Standardized electrode placement map for EEG recording. The image displays the electrode positions used during EEG data collection as configured in EEGLAB for the ASD 113 dataset. This particular file is unique as it includes complete data from all electrode placements, providing a comprehensive framework for EEG analysis.
Aisens 01 00003 g003
Figure 4. List of the 34 explicit attributes extracted from the signals and their mathematical representations.
Figure 4. List of the 34 explicit attributes extracted from the signals and their mathematical representations.
Aisens 01 00003 g004
Figure 5. Workflow of the classifier training, validation, and testing phases. The diagram illustrates the initial training and validation phase where classifiers are subjected to 30 random initializations using the original, PSO, and evolutionary search datasets. Subsequent to model evaluation, the top-performing models from each dataset category undergo a single round of testing. This final phase includes a comprehensive classification and result analysis, culminating in the identification of the most accurate predictive model for ASD diagnosis based on EEG data.
Figure 5. Workflow of the classifier training, validation, and testing phases. The diagram illustrates the initial training and validation phase where classifiers are subjected to 30 random initializations using the original, PSO, and evolutionary search datasets. Subsequent to model evaluation, the top-performing models from each dataset category undergo a single round of testing. This final phase includes a comprehensive classification and result analysis, culminating in the identification of the most accurate predictive model for ASD diagnosis based on EEG data.
Aisens 01 00003 g005
Figure 6. Boxplot of the accuracy of the best models from each approach in the training stage.
Figure 6. Boxplot of the accuracy of the best models from each approach in the training stage.
Aisens 01 00003 g006
Figure 7. Boxplot of the kappa index of the best models from each approach in the training stage.
Figure 7. Boxplot of the kappa index of the best models from each approach in the training stage.
Aisens 01 00003 g007
Figure 8. Confusion matrix of the test performed with (a) original dataset and SVM model. (b) PSO and SVM model. (c) Evolutionary search and Random Forest with 500 trees model.
Figure 8. Confusion matrix of the test performed with (a) original dataset and SVM model. (b) PSO and SVM model. (c) Evolutionary search and Random Forest with 500 trees model.
Aisens 01 00003 g008
Table 1. Performance metrics (accuracy, kappa statistic, sensitivity, specificity, AUC) of various classifiers trained on the full EEG dataset without feature selection. Results represent average values ± standard deviation across 30 independent runs using 10-fold cross-validation.
Table 1. Performance metrics (accuracy, kappa statistic, sensitivity, specificity, AUC) of various classifiers trained on the full EEG dataset without feature selection. Results represent average values ± standard deviation across 30 independent runs using 10-fold cross-validation.
ClassifierAccuracy (%)Kappa StatisticSensitivitySpecificityAUC
Bayes Net74.20 ± 2.07 0.48 ± 0.04 0.76 ± 0.03 0.72 ± 0.03 0.81 ± 0.02
Naive Bayes58.58 ± 1.67 0.17 ± 0.03 0.25 ± 0.03 0.92 ± 0.02 0.60 ± 0.02
Random Tree90.04 ± 1.38 0.80 ± 0.03 0.90 ± 0.02 0.90 ± 0.02 0.90 ± 0.01
Random Forest
 10 trees95.66 ± 0.99 0.91 ± 0.02 0.97 ± 0.01 0.94 ± 0.02 0.99 ± 0.00
 100 trees98.33 ± 0.59 0.97 ± 0.01 0.98 ± 0.01 0.98 ± 0.01 1.00 ± 0.00
 500 trees98.54 ± 0.54 0.97 ± 0.01 0.99 ± 0.01 0.98 ± 0.01 1.00 ± 0.00
SVM - Kernel Polinomial
 Linear96.36 ± 0.87 0.93 ± 0.02 0.96 ± 0.01 0.97 ± 0.01 0.96 ± 0.01
 Degree 297.84 ± 0.69 0.96 ± 0.01 0.98 ± 0.01 0.98 ± 0.01 0.98 ± 0.01
 Degree 398.15 ± 0.79 0.96 ± 0.02 0.98 ± 0.01 0.98 ± 0.01 0.98 ± 0.01
SVM - Kernel RBF
σ = 0.5 92.99 ± 1.50 0.86 ± 0.03 0.98 ± 0.01 0.88 ± 0.03 0.93 ± 0.01
σ = 0.2 98.24 ± 0.59 0.96 ± 0.01 0.99 ± 0.01 0.97 ± 0.01 0.98 ± 0.01
σ = 0.1 99.13 ± 0.44 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.00
Table 2. Performance metrics (accuracy, kappa statistic, sensitivity, specificity, AUC) of various classifiers trained on the full EEG dataset after feature selection by PSO. Results represent average values ± standard deviation across 30 independent runs using 10-fold cross-validation.
Table 2. Performance metrics (accuracy, kappa statistic, sensitivity, specificity, AUC) of various classifiers trained on the full EEG dataset after feature selection by PSO. Results represent average values ± standard deviation across 30 independent runs using 10-fold cross-validation.
ClassifierAccuracy (%)Kappa StatisticSensitivitySpecificityAUC
Bayes Net74.26 ± 2.09 0.49 ± 0.04 0.80 ± 0.03 0.69 ± 0.03 0.82 ± 0.02
Naive Bayes59.44 ± 1.59 0.19 ± 0.03 0.27 ± 0.03 0.91 ± 0.02 0.75 ± 0.02
Random Tree90.55 ± 1.49 0.81 ± 0.03 0.91 ± 0.02 0.90 ± 0.02 0.91 ± 0.01
Random Forest
 10 trees95.87 ± 0.94 0.92 ± 0.02 0.98 ± 0.01 0.94 ± 0.02 0.99 ± 0.00
 100 trees98.21 ± 0.61 0.96 ± 0.01 0.98 ± 0.01 0.98 ± 0.01 1.00 ± 0.00
 500 trees98.37 ± 0.57 0.97 ± 0.01 0.99 ± 0.01 0.98 ± 0.01 1.00 ± 0.00
SVM - Kernel Polinomial
 Linear92.33 ± 1.11 0.84 ± 0.02 0.91 ± 0.01 0.92 ± 0.01 0.92 ± 0.01
 Degree 298.17 ± 0.58 0.96 ± 0.01 0.98 ± 0.00 0.98 ± 0.00 0.98 ± 0.00
 Degree 398.48 ± 0.53 0.96 ± 0.01 0.98 ± 0.00 0.98 ± 0.00 0.98 ± 0.00
SVM - Kernel RBF
σ = 0.5 98.86 ± 0.51 0.97 ± 0.01 0.99 ± 0.00 0.98 ± 0.00 0.98 ± 0.00
σ = 0.2 99.23 ± 0.38 0.98 ± 0.00 0.99 ± 0.00 0.99 ± 0.00 0.99 ± 0.00
σ = 0.1 98.50 ± 0.55 0.97 ± 0.01 0.98 ± 0.00 0.98 ± 0.00 0.98 ± 0.00
Table 3. Performance metrics (accuracy, kappa statistic, sensitivity, specificity, AUC) of various classifiers trained on the full EEG dataset after feature selection by evolutionary search. Results represent average values ± standard deviation across 30 independent runs using 10-fold cross-validation.
Table 3. Performance metrics (accuracy, kappa statistic, sensitivity, specificity, AUC) of various classifiers trained on the full EEG dataset after feature selection by evolutionary search. Results represent average values ± standard deviation across 30 independent runs using 10-fold cross-validation.
ClassifierAccuracy (%)Kappa StatisticSensitivitySpecificityAUC
Bayes Net68.51 ± 2.58 0.36 ± 0.05 0.74 ± 0.03 0.62 ± 0.03 0.74 ± 0.02
Naive Bayes55.73 ± 1.50 0.11 ± 0.02 0.18 ± 0.02 0.93 ± 0.01 0.62 ± 0.02
Random Tree87.43 ± 1.47 0.74 ± 0.02 0.87 ± 0.02 0.87 ± 0.02 0.87 ± 0.01
Random Forest
 10 trees91.40 ± 1.23 0.82 ± 0.02 0.93 ± 0.01 0.89 ± 0.01 0.97 ± 0.00
 100 trees93.70 ± 1.09 0.87 ± 0.02 0.93 ± 0.01 0.93 ± 0.01 0.98 ± 0.00
 500 trees93.91 ± 1.10 0.87 ± 0.02 0.93 ± 0.01 0.94 ± 0.01 0.98 ± 0.00
SVM - Kernel Polinomial
 Linear63.84 ± 2.05 0.28 ± 0.04 0.49 ± 0.03 0.78 ± 0.03 0.64 ± 0.02
 Degree 270.24 ± 1.98 0.41 ± 0.04 0.58 ± 0.03 0.83 ± 0.03 0.70 ± 0.02
 Degree 372.59 ± 1.75 0.45 ± 0.03 0.59 ± 0.03 0.87 ± 0.02 0.73 ± 0.02
SVM - Kernel RBF
σ = 0.5 71.04 ± 1.97 0.42 ± 0.04 0.64 ± 0.03 0.78 ± 0.03 0.71 ± 0.02
σ = 0.2 63.50 ± 1.99 0.27 ± 0.04 0.50 ± 0.03 0.78 ± 0.03 0.64 ± 0.02
σ = 0.1 61.13 ± 1.97 0.22 ± 0.04 0.44 ± 0.03 0.78 ± 0.03 0.61 ± 0.02
Table 4. Results of the tests with the best models of each approach regarding the performance metrics (accuracy, kappa statistic, sensitivity, specificity, AUC).
Table 4. Results of the tests with the best models of each approach regarding the performance metrics (accuracy, kappa statistic, sensitivity, specificity, AUC).
ClassifierAccuracy (%)Kappa StatisticSensitivitySpecificityAUC
Original-SVM RBF 0.194.770.890.950.940.95
PSO-SVM RBF 0.296.710.930.970.970.97
Evolucionary-Random Forest 50095.440.910.950.960.92
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fonseca, F.S.; Silva, A.S.d.O.; Muniz, M.V.S.; de Oliveira, C.V.N.; de Melo, A.M.N.; Passos, M.L.M.d.S.; Sampaio, A.B.d.S.; da Silva, T.C.V.; da Gama, A.E.F.; Montenegro, A.C.d.A.; et al. Supporting ASD Diagnosis with EEG, ML and Swarm Intelligence: Early Detection of Autism Spectrum Disorder Based on Electroencephalography Analysis by Machine Learning and Swarm Intelligence. AI Sens. 2025, 1, 3. https://doi.org/10.3390/aisens1010003

AMA Style

Fonseca FS, Silva ASdO, Muniz MVS, de Oliveira CVN, de Melo AMN, Passos MLMdS, Sampaio ABdS, da Silva TCV, da Gama AEF, Montenegro ACdA, et al. Supporting ASD Diagnosis with EEG, ML and Swarm Intelligence: Early Detection of Autism Spectrum Disorder Based on Electroencephalography Analysis by Machine Learning and Swarm Intelligence. AI Sensors. 2025; 1(1):3. https://doi.org/10.3390/aisens1010003

Chicago/Turabian Style

Fonseca, Flávio Secco, Adrielly Sayonara de Oliveira Silva, Maria Vitória Soares Muniz, Catarina Victória Nascimento de Oliveira, Arthur Moreira Nogueira de Melo, Maria Luísa Mendes de Siqueira Passos, Ana Beatriz de Souza Sampaio, Thailson Caetano Valdeci da Silva, Alana Elza Fontes da Gama, Ana Cristina de Albuquerque Montenegro, and et al. 2025. "Supporting ASD Diagnosis with EEG, ML and Swarm Intelligence: Early Detection of Autism Spectrum Disorder Based on Electroencephalography Analysis by Machine Learning and Swarm Intelligence" AI Sensors 1, no. 1: 3. https://doi.org/10.3390/aisens1010003

APA Style

Fonseca, F. S., Silva, A. S. d. O., Muniz, M. V. S., de Oliveira, C. V. N., de Melo, A. M. N., Passos, M. L. M. d. S., Sampaio, A. B. d. S., da Silva, T. C. V., da Gama, A. E. F., Montenegro, A. C. d. A., de Queiroga, B. A. M., da Silva, M. G. N. M., Lima, R. A. S. C., Seabra Filho, S. d. S., Cruz, S. d. S. J. d. O., da Silva, C. C., de Lima, C. L., Moreno, G. M. M., de Santana, M. A., ... dos Santos, W. P. (2025). Supporting ASD Diagnosis with EEG, ML and Swarm Intelligence: Early Detection of Autism Spectrum Disorder Based on Electroencephalography Analysis by Machine Learning and Swarm Intelligence. AI Sensors, 1(1), 3. https://doi.org/10.3390/aisens1010003

Article Metrics

Back to TopTop