1. Introduction
Neurological disorders (ND) encompass both central and peripheral nervous system diseases, including neurodevelopmental, neurodegenerative, and psychiatric conditions [
1]. NDs are a primary cause of mortality and impaired quality of life worldwide. Early diagnosis can help reduce the course of many illnesses, if not totally eliminate them [
2]. Common ND include Parkinson’s disease (PD), epilepsy, mild cognitive impairment (MCI), schizophrenia, and Alzheimer’s disease (AD), as well as cerebrovascular diseases like stroke, brain tumors, and developmental disorders like autism and attention deficit hyperactivity disorder (ADHD) [
1]. In addition, mood disorders such as depression are regarded as major ND due to their substantial influence on cognitive and emotional processes. Depression, which is frequently associated with disorders such as Alzheimer’s and schizophrenia, is increasingly being explored for its neurological roots and consequences on brain function and structure [
3].
These disorders can cause major complications, including memory loss and neurological malfunction, and have a significant impact on patients’ and families’ daily lives. As a result, early diagnosis is essential for timely treatment. However, early detection of diseases might be challenging. Current diagnostic methods, including clinical examination, neuropsychological assessment, and neuroimaging (MRI, fMRI, PET), are effective but often costly, time-consuming, and dependent on specialized expertise [
4,
5]. In regions with limited access to specialists, delays in diagnosis are common. Additionally, some imaging procedures involve invasive or radioactive materials, posing further risks. As a result, electroencephalography (EEG) has emerged as an attractive alternative due to its non-invasive nature, high temporal resolution, portability, and low cost [
5,
6]. EEG records brain electrical activity and can capture functional abnormalities across neurological and psychiatric disorders [
7]. However, manual EEG interpretation is labor-intensive, prone to subjectivity, and challenged by the signal’s low signal-to-noise ratio and complexity [
8].
To overcome these challenges, computer-aided diagnosis (CAD) systems using machine learning (ML) have been developed to automate EEG analysis. ML algorithms can identify patterns that may not be apparent to human observers, improving diagnostic accuracy and reducing interpretation time. Few research have explored developing a unified framework for multi-class classification of different ND. Most recent EEG-based studies have focused on binary classification of one ND (epilepsy [
9], schizophrenia [
10], AD [
11], MCI [
12], depression [
13]) versus healthy controls. Deep learning methods, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have also been applied to EEG-based ND classification [
14,
15,
16]. While these approaches can capture complex hierarchical patterns, they require large datasets for optimal performance, demand high computational resources, and often act as “black boxes” with limited interpretability [
17]. These limitations make traditional ML approaches with engineered features more practical in many clinical settings, especially when datasets are small or medium-sized.
Table 1 summarizes representative EEG-based classification studies, highlighting the diversity in disorders, features, and classifiers, as well as the performance achieved.
Although many binary classification studies have achieved high performance, multi-class classification—which allows the simultaneous differentiation of multiple disorders—remains less explored. This limitation necessitates multiple CAD systems to cover different diseases, increasing cost and complexity. Most multi-class EEG studies are limited to three-class classification, such as AD vs. MCI vs. controls [
24,
25,
26,
27,
28] or focus on specific neuropsychiatric disorders like depression and schizophrenia [
29,
30,
31,
32,
33]. Disease-to-disease classification (e.g., AD vs. schizophrenia or AD vs. depression) is particularly rare. Nevertheless, to the best of my knowledge, there have not been any EEG-based studies that compare or classify AD and depression within a complete machine learning framework. Addressing these gaps might provide more understanding of the unique characteristics and similarities across these diseases, perhaps leading to more effective diagnostic and treatment methods. A unified framework for binary and multi-class classification of multiple NDs could improve diagnostic specificity, reduce cost, and enhance clinical applicability.
This work addresses this gap by proposing an ML framework for binary, three-class, and four-class classification of AD, depression, schizophrenia, MCI, and healthy controls. Features from time, frequency, entropy, and complexity measures were extracted, feature selection was applied to identify the most informative EEG channels, and performance was evaluated across multiple classifiers. Unlike studies in the literature, a multi-class classification of multiple disorders was proposed within the same framework. Beyond classification, it is investigated whether certain EEG channels could serve as potential biomarkers for these disorders. A cost-effective, interpretable, and scalable CAD system suitable for clinical environments with limited resources is aimed at being provided by this approach.
3. Experimental Results
In this section, the results of feature selection, statistical analysis, and classification studies are given.
3.1. Feature Selection Results
The Lasso algorithm was used for feature selection to reduce the data burden and improve classification performance. However, determining the optimal value of the lambda (λ) parameter in Lasso poses a challenge. Therefore, the lambda value was iteratively changed starting from 0.001 up to 0.05, and the lambda value that yielded the highest classification accuracy was selected, as seen in
Figure 4. So, the shrinkage parameter (λ) was examined with 10-fold cross-validation, and the value with the lowest cross-validation loss (λ = 0.001) was chosen. Information about the features selected for binary classification was given in
Figure 5. The number of features selected for two-class classification (binary classification) was given in
Table 5.
In
Figure 6, the numbers of features selected for binary classification according to channels were given. In the examination of EEG channel selections for different ND, specific channels emerged for each disease. For AD, channel F7 had the highest number of selected features (9), followed by O2 (8 features), indicating its importance in distinguishing this disorder. In depression, Fp2 and Fz showed significant relevance, with Fp2 having 9 features and Fz having 10 features. For MCI, P4 was the most significant channel with 8 selected features, followed by Fp1 and Fz, both showing 7 features. Finally, in schizophrenia, F7 again had the highest feature count with 10 selected features, followed by Fp2 and F4 (9 and 7 features). These channels highlight critical brain regions for classifying these disorders.
3.2. Statistical Analysis Results
The discrimination of features among the five groups (healthy controls, AD, depression, MCI, and schizophrenia) was assessed using both ANOVA and Kruskal–Wallis tests. Following these tests, post hoc analyses were conducted to determine which groups differed significantly from each other. The detailed results, including specific group comparisons and statistical significance, are provided in
Supplementary Files for further reference. As seen in
Supplementary File S1, 239 of the features yielded results that created a significant difference between the groups (
p < 0.05). The post hoc analysis revealed that AD exhibited significant differences 184 times compared to other groups. The control group showed differences 151 times, depression in 119, schizophrenia in 52, and MCI in 28. These findings highlight the varying degrees of distinction between the conditions across different features, which are detailed further in the
Supplementary File S1. Box-plot graphs for complexity and spentropyDyd, which are some of the features that reveal the most differences between the groups, were given in
Figure 7. As seen in
Figure 7, the complexity feature of the 8th EEG channel created a significant difference between the AD–depression, depression–control, depression–MCI, and depression–schizophrenia groups. The post hoc analysis results for the 4 features in
Figure 7 according to the Tukey test were given in
Supplementary File S2.
3.3. Parameter Tuning and Model Performance Timings
In the parameter tuning process for machine learning algorithms, a Bayesian optimization approach has been employed to efficiently explore the hyperparameter space. To ensure robust evaluation, a 10-fold cross-validation method was used to partition the data into training and test sets. Training and testing times of the algorithms were recorded. This approach not only provided a comprehensive assessment of model performance but also enabled a detailed comparison of algorithms in terms of their training and testing durations. By analyzing these timings, insights were gained into the computational efficiency of each algorithm, highlighting the trade-offs between parameter optimization and execution speed. As shown in
Table 6, the running times of the algorithms for two-class classification were evaluated to compare their computational efficiency.
Upon analyzing the training times, it is evident that RF has the shortest training time across all diseases, taking less than a second, which is significantly quicker than the other algorithms like SVM, LDA, and ANN. On the other hand, LDA also shows relatively fast training times, though slightly longer than RF. ANN, while exhibiting the slowest training times, particularly for AD, outperforms others in testing time, showing consistently faster results in comparison to SVM, LDA, and RF, especially for quick evaluations. These results suggest that LDA is efficient for both training and testing, while ANN takes longer to train but is very fast in testing.
3.4. Classification Results
In this study, the performance of various classification algorithms (SVM, LDA, ANN, RF) for the classification of ND using features extracted from EEG signals was evaluated with 10-fold cross-validation. Classification results are presented in tables as mean ± standard deviation of each fold of 10-fold cross-validation. In
Table 7, the performance measures of two-class classification (binary classification) according to control–disease classification were given.
Table 8 shows the results of binary classification for disease–disease classification. The classification results for control–disease and disease–disease comparisons reveal varying levels of effectiveness across machine learning models like SVM, LDA, ANN, and RF. In the
control–AD classification, LDA performed exceptionally well, achieving perfect scores for accuracy (1.0000 ± 0.0000), sensitivity, specificity, precision, F1 score, and AUC. This indicates that the model is perfectly classified between the control and AD groups across all metrics. In contrast, RF performed significantly worse, with lower accuracy (0.7208 ± 0.1498), and only moderate AUC (0.7225 ± 0.1474). This pattern highlights LDA’s robustness in distinguishing AD compared to other models. For the
control–depression comparison, SVM delivered a high accuracy of 0.9589 ± 0.0945 and an impressive AUC of 0.9917 ± 0.0264, making it highly reliable in this classification. LDA also showed excellent results, with an accuracy of 0.9857 ± 0.0452 and similar performance in sensitivity and specificity. RF again underperformed with an accuracy of 0.7339 ± 0.1186, showing reduced sensitivity but moderate specificity (0.8550 ± 0.2088). The results emphasize how LDA and SVM succeed in capturing both the true positives (sensitivity) and true negatives (specificity) in the depression classification. In the
control–schizophrenia comparison, both LDA and SVM provided consistently strong performances. LDA achieved an accuracy of 0.9778 ± 0.0468, and SVM had 0.9403 ± 0.1010, with high AUC values for both models (0.9800 ± 0.0422 and 0.9738 ± 0.0641, respectively). Meanwhile, RF continued to lag behind, offering an accuracy of 0.7278 ± 0.1386, reflecting its challenges in accurately distinguishing schizophrenia from the control group. In the
control–MCI classification, LDA maintained strong performance with an accuracy of 0.9714 ± 0.0602, indicating its high reliability in distinguishing control subjects from MCI patients. SVM and ANN followed closely with the same accuracy of 0.9571, and SVM had a high AUC value (1.0000 ± 0.0000). In comparison, RF lagged behind, showing lower accuracy (0.7304 ± 0.1589) and reduced sensitivity, again confirming its relative inefficiency in handling neurological disorder classifications. Overall, LDA excelled in this control–MCI distinction. These findings underline LDA’s superior generalizability across various ND.
In the disease–disease classification, particularly for the AD–MCI comparison, LDA remained highly effective with 0.9690 ± 0.0655 accuracy, further supported by perfect specificity and precision (1.0000 ± 0.0000). Similarly, SVM and ANN yielded high accuracies (0.9262 ± 0.0781) but still underperformed compared to LDA. RF, once again, demonstrated lower efficiency, showing variability across metrics, particularly in specificity and AUC. For the AD–schizophrenia classification, LDA performed well with 0.9278 ± 0.0624 accuracy and balanced sensitivity/specificity. Although SVM showed an accuracy of 0.8667 ± 0.1220, the model’s AUC was still relatively strong (0.9238 ± 0.1256). ANN outperformed SVM with an accuracy of 0.9028 ± 0.1143. RF, however, had the lowest accuracy (0.7181 ± 0.1058), confirming the trend of reduced effectiveness compared to other methods. For the depression–AD classification, LDA achieved an accuracy of 0.9429 ± 0.0738, demonstrating solid performance across most metrics. SVM followed with an accuracy of 0.8810 ± 0.0635, and ANN closely matched with 0.8238 ± 0.1118, showing comparable AUCs. RF, however, had lower accuracy at 0.7333 ± 0.1830, confirming its relative inefficiency. In the depression–MCI classification, SVM achieved the highest performance with an accuracy of 0.9667 ± 0.0703, displaying excellent consistency across other metrics like specificity and precision (0.9667 ± 0.1054 for both). LDA followed closely with an accuracy of 0.9633 ± 0.0777, showing perfect precision and specificity (1.0000 ± 0.0000). ANN performed comparably with an accuracy of 0.9467 ± 0.0864, while RF struggled with an accuracy of 0.6700 ± 0.2457, highlighting its lower effectiveness in this classification task. In the schizophrenia–MCI classification, LDA outperformed other models with an accuracy of 0.9571 ± 0.0690, along with strong specificity and precision (0.9750 ± 0.0791 for both). SVM showed a slightly lower accuracy of 0.8714 ± 0.1251 but still delivered reliable precision (0.9000 ± 0.1748). ANN underperformed compared to the others, with an accuracy of 0.7857 ± 0.2156, reflecting its variability. RF also showed moderate performance with an accuracy of 0.7429 ± 0.1622, lagging behind in both sensitivity and specificity compared to LDA and SVM. Similarly, in the depression–schizophrenia comparison, LDA excelled, showing an accuracy of 0.9571 ± 0.0690 and nearly perfect AUC, indicating a strong ability to differentiate between these disorders. RF’s performance remained significantly lower, with an accuracy of 0.6857 ± 0.1475 and sensitivity of 0.4333 ± 0.3063, confirming its struggle with disease-to-disease distinctions.
Overall, LDA consistently delivered the highest classification performance in both control–disease and disease–disease comparisons, across multiple ND. SVM also performed well, especially in distinguishing depression and AD from the control group. ANN, similarly to SVM, demonstrated effective performance in most cases, offering a viable alternative for classification. RF, on the other hand, displayed the weakest performance in most cases, with lower accuracy, sensitivity, and AUC scores, suggesting that it may not be as suitable for these medical classifications. These results indicate that LDA and SVM are generally better suited for binary classifications involving ND, while RF may require optimization to improve its performance in these contexts.
In the disease–disease classification, AD-MCI was the best-performing classification, with LDA achieving the highest accuracy of 0.9690 ± 0.0655. This result was reinforced by perfect specificity and precision, making LDA the most reliable model in this context. Similarly, for the Depression-MCI comparison, SVM stood out with an accuracy of 0.9667 ± 0.0703, demonstrating its consistency across various metrics like sensitivity, specificity, and AUC, indicating its strong capability in differentiating between these two conditions effectively.
Table 9 and
Table 10 present the results of the three-class classification using the LDA algorithm. Since the LDA algorithm is more successful than the others, only the results of the classification with LDA were given for the 3-class and 4-class classification. In the three-class classification results for the LDA algorithm, control–MCI–AD achieved the highest accuracy at 0.7932 ± 0.1041, along with strong sensitivity and specificity values. Among the disorder-only groups, the depression–MCI–schizophrenia classification performed best, with 0.8467 ± 0.1088 accuracy and a high AUC of 0.9128 ± 0.1090, indicating effective separation between these three disorders. In contrast, the depression–schizophrenia–AD group showed the lowest performance, with accuracy dropping to 0.6455 ± 0.1088 and lower precision and F-score values. In
Table 9 and
Table 10, the values highlighted in red correspond to the best-performing results among the evaluated algorithms.
Table 11 presents the results of the four-class classification using the LDA algorithm. The classification accuracy varied across different combinations of classes. The highest accuracy was observed in the control–depression–schizophrenia–cognitive decline classification, achieving an accuracy of 0.5789 (±0.1309). This was followed closely by the control–depression–schizophrenia–MCI classification, with an accuracy of 0.5605 (±0.1410). The control–depression–schizophrenia–AD classification exhibited a lower accuracy of 0.5146 (±0.1148), indicating challenges in distinguishing between these conditions. The lowest accuracy was recorded in the depression–schizophrenia–AD–MCI classification, at 0.4857 (±0.1171). Across all classifications for four-class, sensitivity, specificity, precision, and F-score values also reflected the varying degrees of model performance, highlighting the complexities in multi-class classification tasks.
The confusion matrices in
Figure 8 illustrate the classification performance of the LDA model across four experimental scenarios. In the control–AD–MCI task (a), the control class achieved the highest correct classification rate, whereas partial misclassification occurred between AD and MCI. In the depression–MCI–schizophrenia task (b), schizophrenia was identified with high accuracy, while MCI and depression exhibited greater overlap. The four-class control–depression–schizophrenia–MCI scenario (c) proved more challenging, with notable misclassification between depression and schizophrenia. Similarly, in the control–depression–schizophrenia–cognitive decline scenario (d), confusion was observed between Control and cognitive decline, as well as between depression and schizophrenia. Overall, multi-class configurations with more than three categories showed increased misclassification rates, reflecting the higher complexity and feature overlap among certain neurological disorders.
4. Discussion
Clinical EEG diagnosis presents several challenges: firstly, the accuracy of diagnoses is largely contingent upon the expertise of highly trained EEG specialists. Secondly, acquiring the skills necessary to interpret EEG recordings requires extensive pathological education over several years. Lastly, the process of analyzing EEG data is labor-intensive and can be both time-consuming and mentally taxing [
65]. Therefore, there is a growing need for a computer-based system capable of automatically diagnosing multiple ND, which would streamline the process, reduce the burden on specialists, and improve diagnostic accuracy.
Very limited research on ND uses nonlinear EEG analytic techniques. In most studies, linear approaches such as power spectral density and frequency analysis have been used more often. Nonlinear approaches, on the other hand, are gaining popularity due to their ability to find more complicated dynamics in brain signals, providing crucial insights that linear methods may overlook, particularly when discriminating between various neurological diseases. The EEG signals may also be analyzed in a nonlinear way, which makes it possible to obtain an understanding that is not given by the linear measures. For example, Higuchi’s fractal dimension (HFD) has been shown to reflect higher complexity inside depressed patients as compared to healthy control individuals in all brain areas. Such studies have reported such significant success in classifying depression that an enhanced probabilistic neural network model has achieved 91.3% accuracy based on seven frontal EEG channels, indicating its possible applicability in the clinical field for mental health diagnostics [
66]. Also, The Lempel–Ziv complexity (LZC) of multi-channel resting EEG has been shown to be effective in evaluating a variety of neurological and mental diseases, including serious depression [
67,
68].
Numerous studies show a significant correlation between cognitive decline and the reduction in EEG irregularity [
40,
69]. For this reason, in this study, both frequency axis and time axis (entropy, statistical measures) features are extracted from EEG signals. Although previous research suggests a framework for multi-class EEG classification, to the best of my knowledge, no studies have used the machine learning model to classify more than two diseases in healthy people and disease–disease classification. The aim of this study is to develop a system capable of automatically detecting four ND—Alzheimer, schizophrenia, depression, and MCI—using EEG signals, improving diagnostic speed and accuracy. Following this context, some studies in the literature are shown in
Table 12.
Comparing the present results with selected studies in
Table 12 the proposed LDA model achieved notably higher accuracy in several cases. For instance, in the classification of schizophrenia and depression, this study reached 95.71% accuracy with 95% sensitivity and specificity, outperforming the results of Jang et al. [
71]. In the work of Wang et al. [
33], a 10-fold cross-validation approach with LDA yielded 74.32% accuracy, which is lower than the performance observed in the present study for similar classification tasks. Likewise, when compared with the results of Hassanzadeh et al. [
73], the proposed method achieved 92.78% accuracy in the same two-class classification and 76.03% accuracy in the equivalent three-class classification, both exceeding the accuracies reported in their work.
Also, in this study, the LDA classifier produced accuracies of 100% for control–AD, 97.14% for control–MCI, 96.90% for MCI-AD, and 79.32% for 3-class classification (AD–MCI–control), respectively. This performance is consistent with previous EEG-based ND classification studies, where accuracies typically ranged between 70% and 98% depending on the dataset size, feature set, and classification task. For example, Huang et al. [
70] reported 88.2% accuracy in distinguishing depression from healthy controls using KNN after Lasso-based feature selection. Wang et al. [
33] achieved 79.27% accuracy in a three-class classification of schizophrenia, depression, and healthy controls using a convolutional neural network (MUCHf-Net). Cheng et al. [
72] applied dynamic functional connectivity features with a random forest model to classify four groups (nonpsychotic major depression, psychotic major depression, schizophrenia, and healthy controls), obtaining 73.1% accuracy. Compared to these works, the present study’s LDA approach demonstrated competitive or superior performance in both binary and multi-class settings while using a broader classification framework encompassing multiple ND combinations.
According to
Table 7 and
Table 8, SVM also performed strongly, particularly in control–depression classification, with an accuracy of 0.9589 ± 0.0945 and an AUC of 0.9917 ± 0.0264. In the control–schizophrenia classification, the ANN method outperforms SVM. Specifically, ANN achieves an accuracy of 0.9639 ± 0.0583, while SVM shows an accuracy of 0.9403 ± 0.1010. However, RF exhibited weaker results, with its accuracy ranging from 0.6700 to 0.7524, depending on the comparison, often underperforming in sensitivity and AUC scores.
In the disease–disease comparisons, similar trends were observed. LDA consistently achieved high accuracy, sensitivity, and specificity values, especially in the Alzheimer–MCI comparison (accuracy of 0.9690 ± 0.0655, AUC of 0.9750 ± 0.0527). SVM also showed solid performance, with accuracy values exceeding 0.90 in most comparisons. Although the ANN method generally performs well, it falls short of LDA in some classifications, particularly in the depression–MCI comparison, where ANN shows an impressive accuracy of 0.9467 ± 0.0864. However, RF again struggled, with accuracy values ranging from 0.6857 to 0.7524 and often demonstrating lower sensitivity and AUC values compared to LDA, ANN, and SVM.
The results from
Table 9 and
Table 10 demonstrate the performance of the LDA algorithm in three-class classification tasks involving various ND. In the control–disease–disease classification (
Table 9), the highest accuracy was observed in the control–MCI–Alzheimer classification, with an accuracy of 0.7932 ± 0.1041, highlighting the model’s strength in distinguishing between these conditions. Conversely, the lowest performance was seen in the control–depression–MCI classification, with an accuracy of 0.7114 ± 0.1513. In the three disorders comparison (
Table 10), the depression–MCI–schizophrenia classification yielded the highest accuracy of 0.8467 ± 0.1088, indicating effective differentiation among these disorders.
The results of the four-class classification indicate variability in the LDA algorithm’s performance across different combinations. The highest accuracy was achieved in the control–depression–schizophrenia–cognitive decline classification (57.89%), while the lowest accuracy was found in the depression–schizophrenia–Alzheimer–MCI classification (48.57%). These results highlight the challenges in distinguishing between certain conditions and demonstrate significant differences in model performance. Overall, the sensitivity, specificity, and accuracy values reflect the complexities involved in multi-class classification tasks.
Overall, LDA proved to be the most effective classification method across the board, while RF demonstrated significant variability and generally lower performance across different metrics. The RF classifier consistently showed lower performance across classification tasks. This may reflect the combination of a relatively small dataset, high feature dimensionality, and a reduced number of samples per class. Although hyperparameters such as the number of trees were optimized, these factors may have limited the classifier’s generalizability. Future studies should consider alternative balancing techniques, feature reduction strategies, or ensemble approaches to enhance RF performance.
According to the statistical analysis results, as seen in
Figure 6, the channel Fp1 is selected for all four ND (Alzheimer’s, depression, MCI, and schizophrenia). While the number of features selected for this channel varies, its consistent appearance across all conditions suggests its potential importance in distinguishing these disorders. Additionally, other channels like Fp2, F7, and Fz are frequently selected, though not universally across all disorders. These shared channels might provide valuable insights for comparative analysis in multi-class classification tasks.
The frequent selection of frontal lobe channels (Fp1, Fp2, F3, F7, Fz) across the different ND in the
Figure 6 suggests that frontal regions of the brain may play a critical role in differentiating these disorders. The frontal lobe is responsible for higher cognitive functions such as decision-making, attention, and emotional regulation, which are often impaired in these disorders. This focus on frontal channels could indicate that EEG signals from this region are particularly informative for identifying changes in brain activity associated with these conditions. Many neurological and psychiatric disorders studied (e.g., depression, Alzheimer’s, schizophrenia, MCI) involve functional alterations in frontal lobe networks, which are reflected in EEG measures of complexity, entropy, and spectral power [
75,
78,
79,
80]. Prior studies have also shown that frontal EEG abnormalities are robust biomarkers in these conditions, supporting this study’s findings [
75,
78,
79,
80]. However, further work with larger and more diverse datasets is needed to confirm the generalizability of these results.