1. Introduction
Bipolar disorder (BD), often referred to as bipolar depression, is a complex mood disorder that causes significant shifts in a person’s mood, energy, activity levels, and concentration [
1]. These mood swings can be severe, making it difficult for individuals to manage their everyday lives. BD is characterized by alternating episodes of depression and periods of unusually high or elevated mood, with each episode lasting anywhere from several days to weeks. Approximately 40 million people worldwide were living with BD as of 2019, based on estimates from the Global Burden of Disease Study [
2]. While BD affects various aspects of personal, social, and cognitive functioning, the exact cause remains elusive [
3].
Currently, the diagnosis of BD primarily depends on self-reported symptoms and evaluations by healthcare professionals, following guidelines outlined in advanced diagnostic manuals, including the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-V) and the International Statistical Classification of Diseases and Related Health Problems, Eleventh Revision (ICD-11) [
4]. While these methods are valuable, they have limitations, which can delay accurate diagnoses. People may struggle to recognize their symptoms, avoid seeking help due to stigma, or fail to distinguish BD from other similar conditions—all of which can result in missed or delayed diagnoses [
5]. For instance, major depressive disorder (MDD) often mimics depressive episodes of BD, leading to the underdiagnosis of the manic or hypomanic states essential for a BD diagnosis [
6]. This difficulty is compounded by the fact that individuals may not recognize or report hypomanic behavior, often due to a lack of insights associated with subthreshold symptoms of hypomania [
7]. Similarly, borderline personality disorder (BPD) shares symptoms such as brief episodes of intense emotions—including irritability and euphoria—complicating differential diagnosis [
8,
9]. Additionally, attention-deficit/hyperactivity disorder (ADHD) may present with symptoms like hyperactivity and impulsivity, which overlap with manic episodes, further challenging diagnostic clarity [
9].
There is growing evidence linking neurological changes to BD [
10]. Studies using neuroimaging techniques, particularly comparing BD patients to healthy controls (HCs), have uncovered significant differences in brain structure and function [
11]. One such tool, the electroencephalogram (EEG), measures the brain’s electrical activity and primarily reflects the synchronized activity of large populations of neurons. It provides an objective indicator of the brain’s functional state. EEG is particularly useful because it offers high temporal resolution, capturing real-time patterns of brain activity, making it superior to other physiological measures [
12]. This capacity for real-time data makes EEG an essential tool for understanding brain function, detecting abnormalities, and supporting the diagnosis of mental health disorders [
13].
Machine learning (ML), a branch of artificial intelligence (AI), has become a powerful tool for analyzing large datasets. ML uses sophisticated algorithms to uncover patterns and make predictions. These algorithms have shown great promise in analyzing EEG data to detect subtle signals that might be missed by human clinicians, thereby improving the accuracy of mood state classification and BD diagnosis [
12]. In the context of classifying EEG signals, researchers have explored a range of classifiers, such as linear discriminant analysis (LDA), support vector machines (SVMs), k-nearest neighbor (KNN), logistic regression (LR), Bayesian classification (BC), and decision trees (DT), to improve classification accuracy, as discussed in related works [
14].
In BD studies, significant attention has been devoted to the role of brain oscillations and their relationship to EEG frequency bands, with findings indicating that abnormalities in the gamma and alpha frequency bands may be associated with disruptions in the brain’s GABAergic system and thalamus [
15]. However, researchers continue to debate whether these abnormalities stem from changes in brain wave frequencies or dysfunctions in specific brain regions. Some studies have attempted to leverage these EEG frequency band abnormalities to aid in BD diagnosis [
11]. Recent advancements in ML have achieved approximately 90% accuracy in predicting BD using EEG data [
11]. As EEG technology evolves, its potential as a reliable tool for BD detection in clinical practice is becoming increasingly viable. However, despite their promise, ML models are often described as black-box systems, where their decision-making processes are not fully transparent. This lack of interpretability can undermine trust, especially in sensitive fields like healthcare [
16]. To address these challenges, explainable AI (XAI) has emerged to provide clearer insights into how ML models reach their conclusions [
17]. For mental health professionals working with BD, XAI techniques like class activation maps (CAMs) [
18], Shapley additive explanations (SHAP), and layer-wise relevance propagation (LRP) [
19] can highlight the specific brain regions, time intervals, or frequency bands that most influence a model’s output. By increasing transparency, XAI builds trust in AI-powered diagnostic tools, ensuring their effective use in clinical practice. Integrating XAI with EEG-based BD detection enhances the interpretability of AI models and promotes their broader acceptance in mental health diagnosis [
20].
This study presents a framework for differentiating BD from HC using EEG data. It investigates the Hjorth parameters, activity, mobility, and complexity, while employing ML algorithms to improve diagnostic accuracy. Additionally, the framework incorporates XAI techniques to highlight key EEG features that influence classification decisions. To ensure clinical applicability and robustness, leave-one-subject-out cross-validation (LOSOCV) is utilized to evaluate model performance.
Related Works
A growing body of literature has explored the use of resting-state EEG to distinguish individuals with BD from HC participants, leveraging both traditional analysis and ML approaches [
11,
21]. For instance, a study by Kam et al. [
22] demonstrated that BD patients exhibit higher power in the beta and gamma frequency bands compared to HC participants, alongside abnormalities in neural coherence patterns. Furthermore, Arikan et al. [
23] investigated EEG differences between BD, BPD, and HC groups. While significant EEG differences were identified between HC participants and clinical groups, BD and BPD displayed overlapping patterns, with both disorders showing altered alpha and beta power.
ML models have been applied to enhance BD diagnostics using EEG features. Mateo-Sotos et al. [
24] employed an extreme gradient boosting (XGB) approach, achieving 94% accuracy in differentiating BD from HC participants using several non-linear features. Another study [
25] identified abnormalities in alpha activity, particularly in the frontocentral regions of the brain, as a potential biomarker for distinguishing BD in adolescents. Using spectral power and entropy measures, the study achieved high accuracy (95.8%) in classifying BD patients versus HC participants, further emphasizing the diagnostic potential of EEG in younger populations. Similarly, Wang et al. [
15] utilized Welch periodograms to extract EEG frequency band features, such as power, mean, variance, and Shannon entropy, for diagnosing bipolar depression. They applied multiple classifiers, including SVM, LDA, and self-organizing maps (SOM). Among these methods, SOM outperformed others with an accuracy of 97.62%, sensitivity of 98.7%, and specificity of 97.02%, demonstrating the potential of entropy-based features in automated BD detection. Additionally, Montgomery et al. [
26] explored synthetic datasets to train ML models for BD diagnosis, focusing on EEG parameters such as theta-alpha mean, beta frequency band mean, and coherence measures. The study validated the potential of computational approaches in simulating clinical conditions and achieved robust diagnostic accuracy of 92% using a multi-layer perceptron classifier. Recently, Ma et al. [
27] examined non-linear features derived from EEG signals, including fractal dimension and entropy measures, to capture the complexity and randomness in EEG data. They found entropy-based features to be the most effective, achieving a classification accuracy of 95.74%, sensitivity of 93.68%, and specificity of 96.33%.
While studies employing ML have shown promising results, to the best of our knowledge, most lack the integration of XAI techniques, which are crucial for interpreting model decisions and ensuring clinical applicability. This study aims to address these gaps in the detection of BD by integrating ML models with XAI techniques. This combination enhances both the interpretability of the models and their predictive accuracy. Utilizing a proprietary clinical dataset comprising a diverse group of BD patients and HC participants, the study evaluates the performance of various ML classifiers trained with Hjorth parameters. The models’ reliability and accuracy are validated through LOSOCV. Additionally, XAI features are incorporated to assist clinicians in understanding the decision-making process, offering valuable insights into the factors contributing to BD.
The rest of this paper is structured as follows:
Section 2 describes the materials and methods, including details of the dataset, the proposed method, XAI, the extracted feature set, feature selection, ML algorithms, and statistical tests. The results of the proposed method are presented in
Section 3.
Section 4 and
Section 5 provide a detailed discussion of this work and the conclusion of this paper, respectively.
3. Results
All analyses in this study were performed on a Windows PC with an Intel® Core™ i7-1065G7 processor (1.30 GHz) and 16 GB of RAM, using MATLAB 2024a. Preprocessing steps were carried out with EEGLAB v2024.0, and all other analyses were completed using custom MATLAB scripts and functions created specifically for this study.
In this study, EEG data were collected from 40 participants, equally divided between BD patients and HCs, with 50 epochs per participant, resulting in a total of 2000 epochs. Our analysis framework extracted 285 features from 19 electrodes, covering three Hjorth parameters (activity, mobility, and complexity) and five frequency bands (delta, theta, alpha, beta, and gamma). Following feature selection, 12 features were selected as the optimal set, yielding the highest accuracy. To ensure robust model evaluation, we implemented LOSOCV, where each iteration used 39 participants’ data for training while testing on the remaining participant. This approach allowed us to assess the generalizability of our feature selection methods and classification performance across different frequency bands, setting the foundation for our detailed analysis of discriminative EEG patterns between BD patients and HCs.
The classification performance results are presented in
Table 1,
Table 2 and
Table 3.
Table 1 presents the performance metrics of different classifiers analyzing Hjorth activity parameters across various EEG frequency band combinations for distinguishing between BD and HC participants. The linear-SVM classifier demonstrated superior performance, achieving the highest classification accuracy of 92.05% (sensitivity: 90.10%, specificity: 94%) in the beta–gamma band combination. This was closely followed by its performance in the delta–beta band, with 90.30% accuracy (sensitivity: 90%, specificity: 90.60%). The RF classifier showed strong performance in similar frequency bands, achieving 87.45% accuracy in beta–gamma and 88% in delta–beta combinations. While the LDA classifier maintained moderate performance levels, with its best results in the beta–gamma band (84.50% accuracy, 92.30% sensitivity, 76.70% specificity), it consistently underperformed compared to linear-SVM. The KNN classifier generally showed the lowest performance across most frequency bands, though it achieved notable accuracy in the beta–gamma band (85%). Notably, all classifiers demonstrated their strongest performance in the beta–gamma frequency band combination, suggesting this might be the most informative frequency range for distinguishing between BD and HC participants.
The analysis of Hjorth parameter combinations in the beta band reveals distinctive patterns in classification performance across different ML algorithms, as shown in
Table 2. The experimental results demonstrate that parameter selection significantly influences classification accuracy, with certain combinations consistently outperforming others. Linear-SVM emerged as the most robust classifier, maintaining high accuracy across most parameter combinations, with its peak performance of 90.05% (sensitivity: 88.40%, specificity: 91.70%) being achieved using the activity–mobility combination. Interestingly, the addition of the complexity parameter did not substantially enhance performance, as evidenced by the marginally lower accuracy of 89.80% (sensitivity: 88.60%, specificity: 91%) when using all three parameters combined. The RF classifier showed particularly strong performance with the activity–complexity combination, reaching 86.80% accuracy (sensitivity: 84%, specificity: 89.60%), while its performance dropped notably to 77.45% with activity–mobility. The KNN classifier demonstrated consistent performance across different parameter combinations, achieving 85.05% accuracy (sensitivity: 80.40%, specificity: 89.70%) with activity–complexity and 82.55% (sensitivity: 75.70%, specificity: 89.40%) with activity–mobility. The LDA classifier maintained steady performance around 79% accuracy across most combinations, showing particular stability with the activity–mobility–complexity combination (79.85% accuracy, sensitivity: 82.40%, specificity: 77.30%). A striking observation across all classifiers was the markedly poor performance when using only the mobility–complexity combination, with accuracies dropping significantly, linear-SVM showing its lowest performance at 43.25%, while RF achieved the highest in this category at just 65.50%. This consistent pattern strongly suggests that the activity parameter plays a fundamental role in discriminating between the studied conditions, and its inclusion appears to be crucial for achieving reliable classification results.
Table 3 provides the classification performance of various classifiers for Hjorth parameter combinations in the gamma band. The linear-SVM classifier demonstrated the best performance overall, achieving the highest accuracy of 81.95% for the complete set of Hjorth parameters (activity–mobility–complexity), along with strong sensitivity (77.90%) and specificity (86.00%). Notably, it also achieved the top accuracy in the activity–complexity (83.80%) and activity–mobility (83.60%) combinations, with comparable sensitivity and specificity values. The LDA classifier showed consistent performance across combinations, achieving its highest accuracy (77.95%) in the activity–complexity combination, closely followed by activity–mobility (77.80%). When using all three parameters, its accuracy was 74.85%, slightly outperforming RF (74.50%) and KNN (74.35%) for the same combination. RF and KNN demonstrated similar performances across combinations. RF achieved its best accuracy (75.70%) for the activity–mobility combination, with balanced sensitivity (74.60%) and specificity (76.80%). Meanwhile, KNN’s highest accuracy (75.85%) was observed with the activity–complexity combination. For the mobility–complexity combination, performance across classifiers dropped significantly. Linear-SVM showed an accuracy of 53.00%, reflecting the challenge of using only these parameters. LDA, however, achieved relatively higher accuracy (64.65%), outperforming RF (56.15%) and KNN (57.80%). Overall, the results indicate that parameter combinations involving activity, either with mobility, complexity, or both, consistently yield better classification performance compared to other pairings. Linear-SVM emerged as the most effective classifier across these combinations, demonstrating a remarkable balance of high accuracy, sensitivity, and specificity. Its superior performance highlights its ability to exploit the full potential of the Hjorth parameters in distinguishing patterns in the gamma band. LDA, while not outperforming linear-SVM, showed reliable and competitive results, particularly in the activity–complexity combination, where it achieved its highest accuracy (77.95%) with sensitivity and specificity values that were close to linear-SVM’s performance. This suggests that LDA can be a robust alternative when computational simplicity is a priority. RF and KNN exhibited similar performance trends, achieving respectable accuracy levels but falling short of linear-SVM and LDA. Their performances were consistent across combinations, with RF performing slightly better in activity–mobility (75.70%) and KNN achieving its peak in activity–complexity (75.85%). These results demonstrate their effectiveness in moderately complex tasks but highlight their limitations in fully utilizing the discriminative power of Hjorth parameters. The mobility–complexity combination, on the other hand, presented notable challenges for all classifiers, resulting in significantly lower accuracies compared to other combinations. The accuracy of linear-SVM dropped to 53.00%, indicating that this parameter pairing lacks the discriminative features necessary for effective classification. Interestingly, LDA outperformed the other classifiers for this combination, achieving 64.65%, which highlights its resilience in scenarios where feature quality is limited.
In both the beta and gamma bands, incorporating activity consistently proved essential for robust classification using Hjorth parameters, whereas relying solely on mobility and complexity led to markedly lower accuracies. Across all tested classifiers, linear-SVM emerged as the top performer, achieving peak accuracy of 90.05% in the beta band (activity–mobility) and 81.95% in the gamma band (all three parameters). Notably, adding complexity to activity–mobility did not substantially boost performance in the beta band. LDA demonstrated reliable, competitive results, particularly for activity–complexity, often placing second behind linear-SVM and showing its resilience when feature quality was limited (e.g., outperforming other methods with mobility–complexity alone in the gamma band). RF and KNN exhibited moderate and similar performance trends, typically yielding accuracies in the mid-70% to mid-80% range across various combinations.
These findings highlight two key insights: (1) the Hjorth activity parameter plays a crucial role in distinguishing the studied conditions and (2) the linear-SVM classifier effectively leverages Hjorth parameters to achieve reliable, high-accuracy classification. Consequently, it is important to determine which EEG frequency band’s activity parameter offers the most effective biomarker for diagnosing BP.
Table 4 summarizes the classification performance of various ML algorithms, using the activity parameter in different EEG sub-bands, by reporting the accuracy, sensitivity, and specificity for both BD and HC participants. Linear-SVM achieves the best overall performance, particularly in the beta sub-band, with an accuracy of 88.65%, a sensitivity of 86.60%, and a specificity of 90.70%. RF also performs well in the beta sub-band, showing similarly high metrics (accuracy = 87.75%, sensitivity = 84.10%, specificity = 91.40%). Although LDA and KNN achieve moderate results, they generally fall short of the performance levels demonstrated by linear-SVM and RF. These findings indicate that the Hjorth activity parameter, especially in the beta sub-band, may serve as a strong biomarker for identifying BD, given its consistently high accuracy, sensitivity, and specificity across classifiers.
Building upon the above classification insights, a closer examination of the Hjorth activity parameter in the beta frequency band reveals significant differences between BD and HC groups (
Figure 2A). The BD group demonstrated consistently higher values in the activity parameter compared to the HC group within the beta frequency band. This elevated parameter suggests increased neural activation patterns and potential alterations in motor coordination characteristics associated with BD.
Topographical analysis (
Figure 2B,C) of the spatial distribution of the Hjorth activity parameter in the beta frequency band across the scalp showed distinct patterns of difference between the two groups. The most pronounced differences in the beta activity parameter were observed in the frontal and frontocentral regions, particularly in electrodes F3 (
p-value =
, z-value =
) and F4 (
p-value =
, z-value =
). These channels exhibited significantly higher activity parameter values in BD patients compared to HC participants. The pronounced differences in Hjorth parameters in the frontal regions align with the established role of these areas in executive function and emotional regulation.
Figure 2B also shows the mean differences in beta activity; however, our analysis extends beyond these specific channels to investigate whether other brain regions also contribute to the diagnosis of high-performance BP.
Table 4 presents the diagnostic performance of our model using Hjorth activity parameters across various frequency bands, demonstrating that the beta band alone achieves nearly 90% accuracy in detecting BP. Moreover, combining the beta and gamma bands boosts the diagnostic accuracy to 92.05% (
Table 1). These findings underscore the critical importance of the beta and gamma bands as key biomarkers for BP, given that bipolar patients exhibit significantly higher activity levels in these frequency ranges, particularly in frontal channels, than healthy individuals.
To elucidate the contribution of Hjorth activity parameters in our XAI-driven approach, we employed LIME techniques to assess how different brain regions influence the model’s ability to distinguish BP patients from controls.
Figure 3 presents LIME-based visualizations of these influences, with areas in darker red signifying stronger contributions to the classifier’s decisions.
In the beta band, frontal channels emerged as the primary discriminators, with F4, Fz, F3, and right prefrontal (Fp2) channels showing especially high contributions. In contrast, the gamma band revealed a pronounced involvement of left frontal and central areas, as well as notable contributions from the occipital region (O1). The most influential channels were F3, O1, and F4.
Taken together, these LIME findings highlight distinct beta- and gamma-band signatures that differentiate BP patients from HCs. While beta-band activity in frontal regions supplies critical diagnostic information, gamma-band contributions are more localized to left-hemisphere frontal and occipital areas. This hemispheric asymmetry and frequency-specific distribution align with prior neurophysiological research on BD, underscoring the potential of frontal and occipital regions as key targets for future diagnostic advances and therapeutic interventions.
To identify the optimal ML method for diagnosing BP, we performed a thorough feature selection and model comparison. Using mRMR, we narrowed down 12 key features that included Hjorth parameters (activity, mobility, and complexity) and spectral characteristics from the beta and gamma bands. These features were highly effective in distinguishing BD from HC participants. We then evaluated multiple ML classifiers, linear-SVM, KNN, LDA, and RF, to ensure robust model selection and address potential issues with generalization. Among these, the linear-SVM achieved the highest performance, with an AUC of 0.9205 in ROC analysis, followed by KNN and RF (
Figure 4). We selected diverse classifiers due to their distinct strengths in handling varied data characteristics, aiming to demonstrate the importance of our features. Notably, the linear-SVM’s accuracy was highly sensitive to the number of features: when the feature set was reduced to 2, 4, or 6, performance declined, underscoring the importance of comprehensive feature selection. In particular, the linear-SVM model that utilized Hjorth activity and mobility features from the beta and gamma bands consistently outperformed the other classifiers. Validation via LOSOCV further demonstrated its robustness and generalizability, showing stable ROC-AUC values across validation folds.
4. Discussion
In this study, our objective was to investigate the potential of the EEG Hjorth parameters to distinguish between individuals with BD and HC. The main findings show that specific Hjorth parameters, particularly activity and mobility, demonstrated significant differences between BD patients and HCs, specifically in the beta and gamma bands. Notably, BD patients exhibited higher activity compared to HCs, indicating alterations in brain dynamics consistent with the literature on BD-related neurophysiological changes. These results align with previous studies reporting disturbances in brain oscillations of BD [
25,
32].
Increased cortical excitability is linked to high beta activity, which is generally thought to play a facilitating role [
32]. The literature indicates that elevated emotional tension correlates with an increase in beta power, particularly in the anterior region [
31,
32].
The proposed method achieved its best performance by combining beta and gamma frequency bands. This combination outperformed all single-band cases and yielded accuracy, sensitivity, and specificity values of 92.05%, 90.10%, and 94%, respectively (using LOSOCV). Further statistical and XAI analyses revealed that anterior electrodes contributed the most to this classification (
Figure 3). These findings regarding frequency bands and spatial contributions are consistent with previous studies on BD patients [
32].
In our study, the beta band (13–30 Hz) achieved a higher classification accuracy (88.65%) compared to the gamma band (30–45 Hz), which showed 81.95%. This indicates the importance of the beta frequency band in BD patients [
32].
LIME analysis identified the frontal lobes in the beta band and both the frontal and occipital regions in the gamma band as highly significant (
Figure 3). These beta and gamma frequency bands in the anterior and occipital channels are critical for understanding and diagnosing BD. Within the frontal lobe, BD patients generally show increased beta and gamma activity compared to HCs. A previous study [
43] has shown that BD patients exhibit deficits in gamma-band oscillations, which may be linked to dysfunction in GABAergic inhibitory interneuronal activity. The generation of gamma oscillations depends on synaptic GABA neurotransmission, which is essential for coordinating neural network activities across various brain regions. Cortical gamma activity plays an important role in processes such as sensory perception, memory, and problem solving [
32].
In contrast, the occipital lobe normally exhibits a prominent alpha peak, but BD patients display abnormalities in both the beta and gamma frequency ranges. Occipital gamma activity is also relevant, though it manifests differently: when processing emotional faces, BD patients may show abnormal gamma responses indicating a hyperactive reaction to visual stimuli [
44]. The observed increase in occipital beta power during manic episodes may function as a compensatory mechanism for impaired alpha responses [
32]. Consequently, this elevation in beta-band power could be associated with a reduction in alpha activity [
32].
To the best of our knowledge, few studies have applied XAI to BD patients using EEG data, particularly with a reliable validation method such as LOSOCV. The findings of this study demonstrate that XAI and statistical tests provide robust validation for the effective use of Hjorth parameters as a reliable criterion for identifying bipolar patterns. The results show that features extracted from frontal channels exhibit distinctiveness, which contributes to the development of a robust diagnostic system based on these features. Additionally, various established classifiers were employed to achieve optimal classification accuracy and highlight the importance of these features.
The initial phase of this study examined the effectiveness of Hjorth parameters in disease classification, supported by XAI results and statistical tests. As shown in
Figure 3, the LIME analysis results emphasize that the frontal electrodes of the EEG play a crucial role in the classification process.
The findings of our research hold significant clinical implications. The ability to reliably classify BD from HCs using EEG signals could improve early diagnostic processes. The ML model, supported by the explainability provided by XAI techniques, such as LIME, offers clinicians transparency regarding the most influential features in decision making. This transparency is essential for building trust in AI-powered diagnostic tools, ensuring their safe and ethical integration into clinical practice. Furthermore, the observed changes in Hjorth activity may serve as potential biomarkers for BD. Clinicians could use these EEG features, in conjunction with traditional diagnostic methods, to gain deeper insights into the neurophysiological mechanisms underlying BD, leading to more accurate diagnoses. Moreover, this approach could be extended to help differentiate BD from other psychiatric disorders, such as MDD, schizophrenia, and obsessive compulsive disorder, which often share overlapping symptoms.
To demonstrate the effectiveness of our model, we compared its performance to a baseline accuracy of 50% (random chance for this balanced dataset) using the Binomial test to calculate p-values for each subject based on epochs. This approach evaluates whether the observed classification accuracy is significantly better than the expected accuracy under chance conditions. The application of the Binomial test supports its validity for assessing statistical significance in subject-specific analyses. The majority of p-values from our LOSOCV analysis of beta–gamma frequency band using linear-SVM are exceedingly small ( for 20 subjects), providing almost certain evidence that the model’s performance surpasses chance. A minority of p-values, such as , , and , also remain statistically significant at conventional thresholds (). However, two subjects yielded higher p-values (), suggesting that for these individuals, the model’s performance was indistinguishable from chance. These outliers may reflect variability in subject-specific characteristics or other factors influencing classification outcomes. Overall, these results provide strong statistical support that the model performs significantly better than chance for the majority of subjects (38 out of 40, or 95% of subjects), confirming its genuine predictive power rather than random variation. If these subjects are considered valid representations, the accuracy and precision of the method become significantly more dependable, particularly in differentiating between BD and HC.
The findings of our study demonstrate that the linear-SVM algorithm is a highly effective tool for diagnosing BD, achieving an accuracy of 92.05%, with a sensitivity of 90.10% and specificity of 94%, using Hjorth activity parameters extracted from beta and gamma frequency bands. This result aligns with the range reported in similar studies, such as Khaleghi et al. [
25], who achieved 95.8% accuracy using alpha-band EEG power and entropy features via a KNN classifier (k = 3), and Mateo-Sotos et al. [
24], who reported 94% accuracy using an XGB classifier combined with linear and non-linear features. Additionally, another study [
26] achieved 92% accuracy in BD diagnosis using an MLP classifier. A key difference between our study and those mentioned is the cross-validation approach; while most relied on k-fold or splitting methods, our study used the LOSOCV method, which offers a fundamentally different and more rigorous evaluation strategy.
Our study is distinguished by the rigorous use of the LOSOCV method. Unlike k-fold cross-validation, which risks overestimating performance, LOSOCV ensures that the model is tested on entirely unseen subjects, thereby closely mimicking real-world clinical conditions. This method has been rarely utilized, with one exception being a study by Wang et al. [
15], which reported an accuracy of 97.62% using statistical features. Additionally, our incorporation of XAI offers significant advantages. While previous studies have employed conventional classifiers and a range of linear and non-linear features, their methods often lack interpretability. In contrast, our XAI approach identified the most discriminative features (Hjorth activity parameters in beta and gamma bands) and highlighted the role of anterior brain regions in BD detection. This provides valuable insights into the neurophysiological underpinnings of BD. Our model emphasizes interpretability and real-world applicability through LOSOCV, ensuring it remains both practical and clinically relevant. These results underscore the utility of beta and gamma frequency bands as biomarkers for BD, highlighting the potential of this framework as a reliable diagnostic tool.
The accuracy achieved in our study is consistent with the range reported in the literature [
15,
24,
25,
26,
27], which varies from 85% to 97%, depending on the methodology and dataset. Studies employing advanced and complex features, such as multiscale entropy (Mateo-Sotos et al. [
24]), have achieved comparable or higher performance, but require more computational time. In contrast, our proposed framework is computationally efficient due to its straightforward use of Hjorth features, making it more practical for clinical applications.
Despite these promising results and remarkable advantages, our study has some limitations. The relatively small sample size of 40 participants, although consistent with similar previous studies, limits the generalizability of our findings to broader populations. Another limitation of our study is the absence of a comparative group comprising individuals diagnosed with MDD. While our findings focus exclusively on individuals with BD, the inclusion of a comparative MDD group could offer additional insights into the differential characteristics and mechanisms underlying depressive episodes in these distinct clinical populations. Future studies could prioritize larger cohorts with more diverse sample sizes and incorporate advanced algorithms to enhance the assessment of depressive and related disorders. These efforts would contribute to a more comprehensive understanding of the nuances between BD and MDD in subsequent investigations. Additionally, while Hjorth parameters provide a computationally efficient and interpretable feature set, they may not fully capture the complexity of non-linear neural dynamics. Advanced feature extraction techniques, such as entropy-based or fractal analyses, combined with Hjorth activity features, could be explored to enhance sensitivity to subtle patterns in the data. Addressing these limitations will be crucial for refining diagnostic tools and ensuring their applicability in diverse clinical settings.