Next Article in Journal
Education Strategy for the Net Generation
Previous Article in Journal
Deep LSTM Surrogates for MEMD: A Noise-Assisted Approach to EEG Intrinsic Mode Function Extraction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DBN: A Dual-Branch Network for Detecting Multiple Categories of Mental Disorders

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
*
Authors to whom correspondence should be addressed.
Information 2025, 16(9), 755; https://doi.org/10.3390/info16090755
Submission received: 13 July 2025 / Revised: 28 August 2025 / Accepted: 29 August 2025 / Published: 31 August 2025

Abstract

Mental disorders (MDs) constitute significant risk factors for self-harm and suicide. The incidence of MDs has been increasing annually, primarily due to inadequate diagnosis and intervention. Early identification and timely intervention can effectively slow the progression of MDs and enhance the quality of life. However, the high cost and complexity of in-hospital screening exacerbate the psychological burden on patients. Moreover, existing studies primarily focus on the identification of individual subcategories and lack attention to model explainability. These approaches fail to adequately address the complexity of clinical demands. Early screening of MDs using EEG signals and deep learning techniques has demonstrated simplicity and effectiveness. To this end, we constructed a Dual-Branch Network (DBN) leveraging resting-state Quantitative Electroencephalogram (QEEG) features. The DBN is designed to enable the detection of multiple categories of MDs. Firstly, a dual-branch feature extraction strategy was designed to capture multi-dimensional latent features. Further, we propose a Multi-Head Attention Mechanism (MHAM) that integrates dynamic routing. This architecture assigns greater weights to key elements and enhances information transmission efficiency. Finally, the diagnosis is derived from a fully connected layer. In addition, we incorporate SHAP analysis to facilitate feature attribution. This technique elucidates the contribution of significant features to MD detection and improves the transparency of model predictions. Experimental results demonstrate the effectiveness of DBN in detecting various MD categories. The performance of DBN surpasses that of traditional machine learning models. Ablation studies further validate the architectural soundness of DBN. The DBN effectively reduces screening complexity and demonstrates significant potential for clinical applications.

1. Introduction

Mental Disorders (MDs) refer to the collective term for various mental illnesses caused by disorders of brain functioning activity [1], which include Anxiety Disorder (AnxD), Obsessive–Compulsive Disorder (OCD), and Schizophrenia (SCZ), etc. MDs significantly impair of life and represent major risk factors for suicide and self-harm [2]. Stigmatization and impaired social functioning further exacerbate the condition. Due to disease-related stigma, self-denial, misdiagnosis, and underdiagnosis, many cases remain undetected and inadequately treated. Consequently, the prevalence of MDs continues to rise [3].
Early identification and timely intervention are highly effective in mitigating disease progression and improving prognosis [4]. However, in-hospital screening remains time-consuming, operationally cumbersome [5], and financially burdensome [6], making it difficult to expand coverage to high-risk populations. In addition, stigma and fear of stigmatization hinder patients’ willingness to seek medical care [7]. These challenges severely hinder the prevention and treatment of MDs. Therefore, enhancing the efficiency of the MD detection has become a pressing issue in addressing the global mental health crisis.
Developing deep neural networks (DNNs) based on electroencephalogram (EEG) signals can enhance detection efficiency and reduce screening costs. EEG provides feedback on the electrical activity and potential disorders of the human brain [8] and is the basis for MD screening. DNNs can effectively model the nonlinear features of EEG signals, providing reliable decision support [9]. Numerous studies have been conducted in this domain. Typical applications include studies on depressive disorder [10], SCZ [11], and post-traumatic stress disorder (PTSD) [12]. However, the large number of MD categories [1] poses challenges in adapting specific detection networks to diverse disorders. As a result, generalizing predictions across different MD categories remains difficult. Furthermore, the “black-box” nature of DNNs limits the explainability of results [13], hindering clinical acceptance.
In this paper, we propose a Dual-Branch Network (DBN). The DBN utilizes resting-state quantitative EEG (QEEG) features to detect multiple categories of MDs. Additionally, SHAP analysis based on cooperative game theory [14] is incorporated for feature attribution, enhancing the explainability of detection results. Our main contributions are summarized as follows:
(1)
A dual-branch feature extraction strategy is proposed to address the limitations of single-branch architectures, thereby providing more comprehensive feature representations for detection.
(2)
A Multi-Head Attention Mechanism (MHAM) with a dynamic routing is introduced to enhance attention to key elements and optimize feature space hierarchy, thereby improving information transmission efficiency.
(3)
SHAP analysis improves the explainability of DBN’s detections and increases the potential for clinical application.

2. Related Work

As stated earlier, neural networks offer a viable solution for the efficient and reliable detection of MD, which is an ongoing area of focus. Typical architectures include Multilayer Perceptron (MLP) and Convolutional Neural Networks (CNNs). Such models can automatically extract latent features from physiological signals and have thus received extensive attention.
The MLP is a classical classification model commonly used for the MD detection. Anxiety and depression are the most common diagnoses faced by the MLP. Bagheri et al. [15] developed an MLP to identify depression and anxiety in Iranian women with social dysfunction. It provided essential support for the development of targeted intervention initiatives. Mohamed et al. [16] conducted research on MDs in populations in conflict zones. This study utilized the MLP to detect different stages of anxiety and achieved favorable results. MLP’s salient issue is that determining the parameters requires multiple trials and errors, leading to the risk of overfitting. Therefore, the researchers attempted to improve the MLP performance through swarm intelligence algorithms. For instance, chicken swarm algorithms [17] and genetic algorithms [18] have optimized the parameter selection to upgrade accuracy while reducing overfitting. However, since the MLP is not sensitive to spatial or temporal structure, optimizing the parameter selection alone does not fundamentally break the performance bottleneck.
CNNs have become a popular choice for detecting MDs due to their superior performance in extracting local features. In Major Depressive Disorder (MDD) research, Uyulan et al. [19] processed EEG signals as grayscale images and used them to construct an adaptive learning model with a CNN architecture. The accuracy of the model was significantly improved compared to traditional approaches. Wang et al. [20] constructed a 3D Densely Connected CNN using magnetic resonance images of brain structures. It effectively extracted brain structural variability between MDD and healthy controls, with excellent performance. Besides MDD, CNNs are also common in Attention Deficit Hyperactivity Disorder (ADHD) screening. Wang et al. [21] constructed a CNN based on the independent components of functional MRI (fMRI) in individuals with ADHD. It effectively distinguished implicit feature variations and outperformed classical methods such as logistic regression and support vector machines. Akbugday et al. [22] proposed a CNN classifier for screening ADHD based on an EEG topographic feature map, which outperformed machine learning models. Furthermore, CNNs also shone in detecting PTSD [23], sleep disorders [24], and psychological disorders [25]. The CNN has powerful feature modeling capabilities. However, it is less efficient in processing tabular features, with a risk of losing global information.
The emergence of novel neural networks also provides ideas for designing the detection architectures for MD. The MHAM and the derived Transformer [26] are typical cases. Xia et al. [27] combined the MHAM and the CNN to achieve the MDD classification by EEG. The MHAM learned the latent connectivity relationships of EEG channels. Then, features were extracted by the CNN and fed into the fully connected layer for detection. The study achieved an average accuracy of 91.06%, which was superior to the comparison methods. In addition, Transformer, which focuses on the MHAM, has also received attention from many scholars for its excellent feature modeling performance. Aina et al. [28] designed an integrated learning architecture that incorporates the CNN and Visual Transformers. It achieved the prediction of depression, anxiety, etc., from the facial emotion recognition dataset with an accuracy of 81%. The spotlight was on Aina’s attention to explainability, with contributing areas of decision support highlighted in the image. The MHAM is effective in reducing redundancy but is under-examined in modeling hierarchical relationships of critical features to improve the efficiency of information transmission.
Current research in MD detection focuses on analyzing brain physiological signals, commonly fMRI and EEG. We chose EEG because it is cheap and easy to obtain, which reduces the cost of patient screening. In addition, the MD screening process has become streamlined. In constructing models, most existing methods rely on single-channel extraction of features to detect an individual category of the MD. Inadequate learning of features is a shortcoming. Meanwhile, clinical practice frequently faces more complex disease conditions and co-morbidities [29,30]. Therefore, the model’s effectiveness in detecting different MDs is unknown, making it difficult to migrate the application immediately. In addition, clinical requires a high degree of transparency in the detection results. However, most researchers tend to emphasize accuracy improvement and pay insufficient attention to explainability. Therefore, we construct the DBN using QEEG features to improve the detection accuracy and explainability for MDs.

3. Methods

3.1. The DBN Overall Architecture

The DBN primarily consists of two parts: a dual-branch feature extraction strategy and the MHAM with a dynamic routing, as illustrated in Figure 1. In the dual-branch feature extraction strategy, the MLP and the 1D-CNN model the global complex nonlinear relationships and local features of QEEG features. We use the power spectral density (PSD) and coherence value of each band of the QEEG signal to detect the MD. There is an interaction between these features of sensory signals at different locations, i.e., implicit spatiotemporal relations. 1D-CNN models local features of such relations. In addition, we introduce MLP modeling the global relation, considering the tabular property of QEEG features. The two feature extractors complement each other to provide more comprehensive information for MD detection.
Fusion of two-branch outputs yields computed results. Further, DBN incorporates the MHAM with dynamic routing. The MHAM runs multiple independent attentional mechanisms in parallel to highlight key information, thus obtaining the attentional distribution in different subspaces of the input sequence. Then, the dynamic routing optimizes the spatial hierarchy of attentional features to ensure efficient information delivery. The fully connected layer adapts the output dimension of the dynamic routing. Finally, the detection result of the MD is obtained by S o f t m a x computation. The following subsections provide a detailed description of the key components of the DBN.

3.2. Dual-Branch Feature Extraction Strategy

As in Figure 1, the inputs are extracted by the MLP and the 1D-CNN for global nonlinear relationships and local features, respectively, to achieve feature modeling. Denote raw QEEG features as X = [ x 1 , x 2 , , x φ ] , x i R n , i [ 1 , φ ] , where n is the feature dimension and φ is the batch size. The MLP branch output O MLP is given by the following equation:
O MLP ( X ) = W ( 3 ) ( f ( 2 ) ( W ( 2 ) ( f ( 1 ) ( W ( 1 ) X + b ( 1 ) ) ) + b ( 2 ) ) ) + b ( 3 )
where W ( j ) , j [ 1 , 3 ] is the MLP weight parameter for each layer, b ( j ) is the bias term. f denotes the batch normalization and GELU activation operations performed on the hidden layer output, computed as
f ( j ) = G E L U ( j ) B a t c h N o r m ( j )
where denotes a back-to-front series calculation operation.
In the 1D-CNN branch, the raw feature X is convolved in 3 layers to get the output O conv , computed as
O conv ( X ) = ( F c ( 3 ) F c ( 2 ) F c ( 1 ) ) ( X )
where F c ( j ) , j [ 1 , 3 ] , denotes performing convolution, batch normalization, and ReLU activation on the input, computed as
F c ( j ) = R e L U ( j ) B a t c h N o r m ( j ) C o n v ( j )
The j - t h layer convolution output O conv ( j ) is:
O conv ( j ) = O conv ( j , 1 ) , O conv ( j , 2 ) , , O conv ( j , φ ) = o conv ( j , 1 , 1 ) o conv j , 2 , 1 o conv ( j , φ , 1 ) o conv ( j , 1 , 2 ) o conv j , 2 , 2 o conv ( j , φ , 2 ) o conv ( j , 1 , K j ) o conv j , 2 , K j o conv ( j , φ , K j )
where K j is the number of convolution kernels in layer j . o conv ( j , s , t ) = x s k ( j , t ) denotes the result of convolving the s - t h input with the t - t h convolution kernel in layer j , o conv ( j , s , t ) R d , s [ 1 , φ ] , t [ 1 , K j ] , and d is the output vector dimension.
O conv ( j ) is fed into batch normalization and ReLU activation to obtain the j - t h output, and multilayer concatenation operations to obtain the output of the convolutional branch O conv . The dual-branch feature extraction strategy output O db is obtained by concatenating the MLP output O MLP and the 1D-CNN output O conv , as shown in Equation (6).
O db = C o n c a t ( O MLP , O conv )

3.3. The MHAM with a Dynamic Routing

The MHAM enhances the weighting of the core elements in O db . On this basis, the dynamic routing of capsule neural networks [31] is integrated to combine and optimize the feature space hierarchy. O db is linearly transformed to obtain the query vector Q , the key vector K , and the value vector V . Denote O db = [ o db ( 1 ) , o db ( 2 ) , , o db ( φ ) ] , o db ( i ) R d , d is the output dimension of the dual-branch feature extraction strategy, then we have
Q , K , V = o db W Q , o db W K , o db W V
where Q , K , V R d , W Q , W K , W V R d × d is the linear layer weight matrix.
We determine the number of attention heads h in the MHAM according to the following:
  • Refer to the criterion for the value of h in [26].
  • Consider the dimension matching constraint with d   %   h = 0 .
  • The computational overhead associated with increasing h is non-negligible.
  • Combine the research needs and pre-experiments to give the final value.
According to h , Q , K , and V can be grouped into Q m , K m , V m R h × d k , where d = h × d k , d k is the vector dimension corresponding to each head. Q m = { Q 1 m , Q 2 m , , Q h m } , K m = { K 1 m , K 2 m , , K h m } and V m = { V 1 m , V 2 m , , V h m } . Each head output can be obtained from Scaled Dot-Product Attention [26]:
head i = S o f t m a x ( Q i m ( K i m ) T d k ) V i m , i [ 1 , h ]
where head i R d k is the output of the i - t h head. The output o MH of the MHAM is obtained by concatenating the output of each head and performing a linear transformation, computed as
o MH = C o n c a t ( head 1 , , head h ) W O
where o MH R d , W O R d × d is the weight matrix of the linear layer.
In the dynamic routing, o ^ MH is computed as a prediction vector from the transformation matrix Z R p × d , p is the product of the number and the dimension of the output vectors.
o ^ MH = Z o MH
The output of the dynamic routing can be calculated from the o ^ dr according to Equation (11).
o ^ MH = η o ^ MH o ^ dr = o ^ MH 2 1 + o ^ MH 2 o ^ MH o ^ MH
where η is the coupling coefficient, which can be calculated from Equation (12).
η = Softmax ( ψ ) = e x p ( ψ ) q e x p ( ψ q )
where q is the number of output vectors, ψ is the routing weight. The loop iterates to update the routing weights ψ ψ + o ^ MH o ^ dr . Further, the coupling coefficients η and the output o ^ dr are updated according to Equations (11) and (12). The dynamic routing output o dr is obtained by reaching the specified number of iterations. The batch output of the dynamic routing is O dm = [ o dr ( 1 ) , o dr ( 2 ) , , o dr ( φ ) ] . Input the flattened O dm into the fully connected networks for dimensional matching. Finally, S o f t m a x calculation is performed to obtain the detection results, as shown in Equation (13).
O ^ f = ( S o f t m a x L i n e a r F l a t t e n ) ( O dm )
The cross-entropy loss function is chosen to measure the deviation of the predicted value from the target value when the error is back-propagated by
L o s s = 1 φ i = 1 φ O f i log ( O ^ f i )
where O f is the true label of sample. After obtaining the loss from Equation (14), the parameters of the DBN are iteratively updated according to the back propagation principle to complete the training process.

3.4. Explainability of the DBN: SHAP Analysis

SHAP analysis allows a quantitative explanation of detection results and improves the transparency of the DBN. The core is the Shapley value calculation to measure the marginal contribution of different features to the outcome. Denote the set of raw features as Φ = { μ 1 , μ 2 , , μ ϑ } , where μ R u , ϑ is the total number of features and u is the number of samples, then the Shapley value of feature i is:
ζ i = S Φ \ { μ i } S ! ( ϑ S 1 ) ! ϑ ! ( y ( S { μ i } ) y ( S ) )
where Φ \ { μ i } is the subset of various features that do not contain feature i . S { μ i } is the subset containing the feature i . y ( S { μ i } ) is the corresponding detection result for the subset containing feature i . y ( S ) is the corresponding detection result for the subset that do not contain feature i .
According to Equation (15), ζ i indicates the relative importance of QEEG features on MDs detection results. The higher ζ i is, the greater the contribution of feature i to MDs detection. Since the DBN is a kind of DNN, we choose the GradientExplainer to complete the feature importance assessment and the Shapley value dependency calculation.

4. Dataset and Experimental Settings

4.1. Dataset Introduction

The dataset was provided by [32], approved by the ethical review board (20-2019-16), and made publicly available. It was obtained from the Resting State Assessment Study at SMG-SNU Boramae Medical Center, South Korea, from January 2011 to December 2018. The initial diagnosis was made by a psychiatrist based on DSM-IV criteria promulgated by the American Psychiatric Association (APA) [33]. Clinical confirmation of diagnostic findings was completed by two psychiatrists and two psychologists in 2019.
The dataset consists of 945 subjects’ medical records, psychologically assessed IQ scores, and QEEG features from resting-state assessments. A total of 850 patients with MDs and 95 healthy controls are included. The inclusion criteria are as follows: age between 18 and 70 years old; fulfillment of the diagnosis shown in Table 1; and no difficulty in reading, hearing, writing, or understanding the Korean language. The exclusion criteria are as follows: lifetime and current medical history of neurologic disease or brain injury, neurodevelopmental disorders (i.e., intellectual disability with IQ < 70 or borderline intellectual functioning (70 < IQ < 80), tic disorders, ADHD), or any neurocognitive disorder. Our research focuses on six main categories and nine subcategories underneath, as detailed in Table 1. Among them, the OCD and the SCZ are not further subdivided.
The detailed processing flow and equipment used for QEEG acquisition are recorded in [32]. We visualize it in Figure 2 to enhance understanding. QEEG is 19 channels of data collected during the 5 min closed-eye resting state. Channels and sensor locations are identified according to the 10–20 international standard lead system. Among them, the acquisition channels include Fp1, Fp2, F7, F3, Fz, F4, F8, T7, C3, Cz, C4, T8, P7, P3, Pz, P4, P8, O1, and O2, and the sensor locations are marked in Figure 3. The QEEG is converted into a combination of the PSD and functional connectivity (FC) in six frequency bands: delta (1–4 Hz), theta (4–8 Hz), alpha (8–12 Hz), beta (12–25 Hz), high beta (25–30 Hz), and gamma (30–40 Hz). The FC is represented by the coherence value. Figure 3 shows the visualization of the mean values of the PSD of MNE-Python [34] for each band of the six categories of MDs. The dimension of the processed QEEG features is 1140, including power spectral density ( n 1 = 19 × 6 ) and FC ( n 2 = 171 × 6 ). We use QEEG features for our study and divide it into an 8:2 ratio between the training and testing sets.

4.2. Experimental Settings

The experimental simulation environment is built based on Windows 11 and the PyTorch deep learning framework. Acceleration of the training process is performed using a GPU, corresponding to CUDA version 12.0. The programming language is Python, version 3.8.0, and Torch version 1.12.0. The cross-entropy loss function and the Adam optimizer are chosen for the experiment. The maximum number of iterations is 500, and the minimum number of iterations is 20. The early stopping strategy is used to improve the overfitting problem that may occur during model training. Details of the experimental parameter configurations are shown in Table 2.

4.3. Performance Evaluation Metrics

Clinical diagnosis has a high cost associated with detection errors across various categories, and all samples require equal treatment. Hence, we use the macro-averaged metrics as the performance evaluation term. To evaluate the performance of the DBN, we choose the confusion matrix, Accuracy ( A c c ), macro average F1 score ( F 1 macro ), and ROC curve as evaluation metrics, as stated below:
Confusion matrix: It describes the correspondence between the predicted and true values for the MDs/healthy subjects in matrix form.
A c c : It describes the percentage of correctly identified MDs and healthy subjects out of all samples detected. It is calculated as follows:
A c c = T P + T N T P + T N + F P + F N
F 1 macro : It is the harmonic mean of macro average Precision ( P macro ) and macro average Recall ( R macro ). P macro denotes the mean of percentages of two categories in the sample predicted to be MDs/healthy that are truly MDs/healthy. R macro denotes the mean of percentages of two categories, where the true value is MDs/healthy and the prediction is also MDs/healthy.
F 1 macro is calculated as follows:
F 1 macro = 2 × P macro × R macro P macro + R macro P macro = 1 2 i = 1 2 T P i T P i + F P i R macro = 1 2 i = 1 2 T P i T P i + F N i
ROC curve: It reflects the effectiveness of the DBN detection through the trend of correlation between False Positive Rate ( F P R ) and True Positive Rate ( T P R ). The effectiveness of MD detection can be directly observed by the area between the curve and the abscissa (AUC value).
In Equations (16) and (17), T P indicates the number of samples in which patients with real MDs are predicted to be ill, F P describes the number of healthy subjects detected to have MDs, T N is the number of healthy subjects detected to be healthy, and F N denotes the number of MD subjects predicted to be healthy.

5. Results and Discussion

5.1. Effectiveness of the DBN in the Detection of MDs

We disassembled the detection of MDs into the binary classification task and evaluated the performance on the test set. Figure 4 shows the confusion matrix of the detection results for the six main MDs. Detection results for the nine subcategories are given in Figure 5. The horizontal coordinate of the confusion matrix is the predicted prevalence, and the vertical coordinate is the true clinical diagnosis. The darker the color, the greater the number of subjects.
As seen, the six main MDs are recognized correctly as AddD (95%), AnxD (81%), MD (91%), OCD (75%), SCZ (87%), and TSRD (87%). The recognition rates for the nine subcategories are as follows: ASD (63%), AjD (50%), AUD (76%), BAD (89%), BD (69%), DD (84%), PD (78%), PTSD (82%), and SAD (70%). The results show that the DBN can fulfill the task of detecting different categories of MDs. In parallel, we also note that the ASD and the AjD have lower recognition accuracies. The reason is that the DBN does not learn enough features for small sample objects. Improvement is expected through sample enhancement.
From the experimental results, the DBN is adaptable to screening for multiple categories of MDs, which is consistent with clinical practice. The DBN provides important support for the adjunctive diagnosis of clinically complex diseases or comorbid conditions.

5.2. Ablation Experiments: Structural Validity Analysis

To verify the validity of the DBN structure, we disassembled the model and completed the analysis of the ablation experiment. We evaluated the DBN performance in terms of A c c and F 1 macro , and the results are shown in Table 3 and Table 4. We masked the network structure to evaluate the effectiveness of the dual-branch feature extraction strategy and the dynamic routing.
Firstly, we removed the MLP and the 1D-CNN in the dual-branch feature extraction strategy, respectively. Among the six categories of results, A c c and F 1 macro for dual-branch feature extraction strategies outperform the MLP in four categories, as well as outperform 1D-CNN in five categories. The detection results of the refined nine MDs also show the effectiveness of the dual-branch feature extraction strategy. It outperforms the MLP in both A c c and F 1 macro in seven detections. Meanwhile, it leads the performance in six detection tasks compared to the 1D-CNN.
In addition, we also analyzed the structural validity of the MHAM with a dynamic routing. The analysis reveals that the dynamic routing performs optimally for the six main categories of MDs. Among them, the A c c and F 1 macro of the detection results of the OCD are the most significantly improved, with 11% and 12%, respectively. The AnxD’s detection improves by 5% and 7%, and the SCZ improves by 2%. The performance of the remaining three categories is flat. In most of the nine subcategory experiments, detection is improved. We also note that DD detection is ineffective. By calculating the evaluation metrics, we found that the DBN is poor at recognizing the HC in this experiment. This is due to the unbalanced sample shares of the two categories, which require data governance.
The ablation experiments show that the dual-branch feature extraction strategy can mine global complex nonlinear relationships and local features in QEEG features. This provides a rich and reliable verdict for the detection of MDs. And the MHAM with dynamic routing can effectively combine and deliver key information to support the detection of MDs.

5.3. Comparative Experiment: Performance Advanced Level Analysis

We designed a 5-fold cross-validation experiment to verify the state-of-the-art of model performance. In experiments, we compared the DBN with commonly used machine learning models for MDs prediction (e.g., SVM [35], Random Forest [36], XGBoost [37], and LightGBM [38]). The performance comparison results are presented in the form of ROC curves, which are detailed in Figure 6 and Figure 7. We recorded the T P R , F P R , and AUC value for each fold and averaged them to obtain the final results. The subplots in Figure 6 and Figure 7 show the results of the binary classification for different MDs and healthy controls.
AUC value is our main observation. As can be seen in Figure 6, the DBN improves over the optimal model by 3%, 4%, and 2% for the three detection tasks of AnxD, OCD, and TSRD. The other three tasks perform similarly to the optimal model. Of the nine subcategories, seven perform optimally and two perform similarly. It demonstrates that the DBN is more adaptable to new samples than machine learning.
The shape of the ROC curve is also an important indicator of performance. It is observable that the ROC curve of DBN tends to be steeper in the early stages in most tasks, especially in the detection of nine categories. It indicates that the DBN maintains a high T P R even at a lower F P R , and shows its strong discrimination ability for different categories. In addition, there are intersections between the ROC curves of various models. Therefore, there is no “absolute advantage”, which places demands on the adaptability of the model. DBN outperforms the comparison model at low F P R , and is suitable for medical diagnosis scenarios with low tolerance for false positives.
The lack of data significantly affects all models. As the result shows, the ROC curves of all models exhibit sawtooth fluctuations rather than smooth curves. The reason for this phenomenon is that we split the raw dataset, resulting in a limited sample size. Under a limited test set, changes in the threshold have a discrete effect on T P R and F P R , resulting in a sawtooth-shaped ROC curve. Despite this issue, the overall trend of the average ROC curve and its AUC value remains a reliable indicator for evaluating performance.
Cross-validation results show that machine learning models perform very differently in detecting multiple subcategories of MDs. And the DBN can be adapted to the detection scenarios of the multiple subcategories and outperforms machine learning models for tabular data detection. The DBN exhibits robust prediction for the multiple categories of MDs and generalization for the emergence of new cases.

5.4. SHAP Analysis

Feature attribution analysis can effectively enhance the explainability of the DBN detection results, which is important for clinical applications. We implemented the DBN explainability research using SHAP analysis, as shown in Figure 8 and Figure 9. A summary of the contribution of different QEEG features to the detection of MDs is shown in two groups of bee colony plots.
In the bee colony plot, the horizontal coordinates indicate the Shapley values of QEEG features, reflecting their impact on detection results. The left-hand side gives a ranking of important features based on the mean absolute Shapley value. Points in the diagram are raw values of features, with different values indicated by color differences. The further to the right the point is, the greater the positive effect on the output; conversely, the more negative the effect.
The results show that the main influencing factors are coherence values, e.g., COH.A.delta.a.FP1.b.FP2 and COH.C.alpha.r.O1.s.O2. It indicates that the synergistic effect of neural activity in different brain regions has a significant impact on the detection of MDs. Therein, COH.A.delta.a.FP1.b.FP2 has significant effects on the detection of the two main categories and five subcategories. Suggests that prefrontal brain functional connectivity provides support for the detection of a variety of MDs. This finding is supported by previous research [39]. Researchers concluded that prefrontal EEG is closely related to MDs and used a 3-electrode (Fp1, Fpz, and Fp2) to obtain signals to detect major depression.
Therefore, the SHAP is effective in providing a reliable explanation for the attribution analysis of the DBN detection results. It reveals the dependence of the pathogenesis of multiple MDs on functional brain connectivity. Thus, it enhances the transparency and reliability of the predictive results of the DBN. Our model provides novel ideas for intelligent clinical decision support.
Although the DBN improves the efficiency and explainability of MD detection, it has limitations. On the one hand, the size of the dataset is limited. The raw dataset contains only 945 samples, and the sample splitting for the binary classification further reduces the data size. The lack of samples leads to unstable training and uneven detection in machine learning, as shown in Section 5.3. Despite the improved detection quality, the small sample size still leads to insufficient feature learning by the DBN. For example, it detects ASD and AjD with low rates of correctness (Section 5.1) and does not discriminate sufficiently between DD and HC (Section 5.2). On the other hand, we split the detection into a binary classification problem, which also increases the workload. However, the problems are not insurmountable. The gradual improvement of medical data sharing policies and the development of wearable smart detection devices will solve the data limitation. In addition, multi-category detection for MDs supported by massive data will also be developed. Despite the limitations, there is reason to believe that DNNs have a promising future for clinical applications in MDs’ detection based on the DBN performance.

6. Conclusions

We propose the DBN for enhancing the quality of detecting multiple MDs with QEEG features. The DBN is evaluated in terms of detection accuracy, structural validity, performance advances, and explainability. According to the results, the DBN can accurately detect most categories of MDs, and it outperforms the machine learning models. The ablation analysis demonstrates the effectiveness of the dual-branch feature extraction strategy and the MHAM with dynamic routing. Additionally, SHAP analysis enhances the explainability of the DBN detection results by visualizing the influence of QEEG features. Overall, the DBN improves the detection efficiency of MDs with computer-aided technology. Moreover, relying only on EEG signals as the basis for judgment simplifies the clinical screening process and reduces the financial burden on patients.
Our work also has limitations, and the limited data resources affect the DBN performance improvement. In subsequent work, we will attempt to integrate medical data resources and introduce sample enhancement to improve the quality of MDs’ detection. Moreover, although our work demonstrates the excellent performance of the DBN through k-fold cross-validation (k = 5), the impact of different random initializations on the results is not systematically evaluated. Therefore, in our following work, we will further validate the robustness of the model by conducting large-scale multi-randomized seed experiments.

Author Contributions

All authors, L.Z., H.C. and Y.P., contributed to different aspects of this work. L.Z. and H.C. were responsible for conceptualization, data collection, software implementation, and experimental validation. L.Z. also led the original draft preparation and manuscript revision. H.C. contributed to the formal analysis and manuscript review. Y.P. contributed to data analysis and model construction, and provided ongoing guidance throughout the research process. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original dataset presented in this study can be found in online repositories at https://osf.io/8bsvr/ (accessed on 24 December 2024). If there are difficulties accessing the website, the alternative URL is https://www.kaggle.com/datasets/shashwatwork/eeg-psychiatric-disorders-dataset (accessed on 24 December 2024). The code is open source at https://github.com/longhao-zhang-ustb/DBN_MODEL (accessed on 30 August 2025). Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
DBNDual-Branch Network
MDsMental Disorders
EEGElectroencephalogram
QEEGQuantitative Electroencephalogram
SHAPShapley Additive exPlanations
AnxDAnxiety Disorder
OCDObsessive–Compulsive Disorder
SCZSchizophrenia
DNNsDeep Neural Networks
PTSDPost-Traumatic Stress Disorder
MHAMMulti-Head Attention Mechanism
MLPMultilayer Perceptron
CNNsConvolutional Neural Networks
MDDMajor Depressive Disorder
ADHDAttention Deficit Hyperactivity Disorder
fMRIfunctional FMRI
APAAmerican Psychiatric Association
AddDAddictive Disorder
AUDAlcohol Use Disorder
BADBehavioral Addiction Disorder
PDPanic Disorder
SADSocial Anxiety Disorder
HCHealthy Control
MDMood Disorder
BDBipolar Disorder
DDDepressive Disorder
TSRDTrauma- and Stress-Related Disorder
ASDAcute Stress Disorder
AjDAdjustment Disorder
PSDPower Spectral Density
FCFunctional Connectivity

References

  1. Broyd, S.J.; Demanuele, C.; Debener, S.; Helps, S.K.; James, C.J.; Sonuga-Barke, E.J. Default-mode brain dysfunction in mental disorders: A systematic review. Neurosci. Biobehav. Rev. 2009, 33, 279–296. [Google Scholar] [CrossRef]
  2. Brausch, A.M.; Whitfield, M.; Clapham, R.B. Comparisons of mental health symptoms, treatment access, and self-harm behaviors in rural adolescents before and during the COVID-19 pandemic. Eur. Child Adolesc. Psychiatry 2023, 32, 1051–1060. [Google Scholar] [CrossRef]
  3. World Health Organization. World Mental Health Report: Transforming Mental Health for All; World Health Organization: Geneva, Switzerland, 2022. [Google Scholar]
  4. Reynolds, C.F., 3rd; Jeste, D.V.; Sachdev, P.S.; Blazer, D.G. Mental health care for older adults: Recent advances and new directions in clinical practice and research. World Psychiatry 2022, 21, 336–363. [Google Scholar] [CrossRef]
  5. Bass, V.; Brown, F.; Beiser, D.G.; Peterson, T.; Gibbons, R.D.; Nagele, P. Preoperative assessment of anxiety and depression using computerized adaptive screening tools: A pilot prospective cohort study. Anesth. Analg. 2022, 134, 853–857. [Google Scholar] [CrossRef]
  6. Arias, D.; Saxena, S.; Verguet, S. Quantifying the global burden of mental disorders and their economic value. eClinicalMedicine 2022, 54, 101675. [Google Scholar] [CrossRef]
  7. Li, L.; Peng, W.; Rheu, M.M.J. Factors predicting intentions of adoption and continued use of artificial intelligence chatbots for mental health: Examining the role of UTAUT model, stigma, privacy concerns, and artificial intelligence hesitancy. Telemed. e-Health 2024, 30, 722–730. [Google Scholar] [CrossRef] [PubMed]
  8. Kopanska, M.; Ochojska, D.; Trojniak, J.; Sarzynska, I.; Szczygielski, J. The role of quantitative electroencephalography in diagnostic workup of mental disorders. J. Physiol. Pharmacol. 2024, 75, 39415522. [Google Scholar]
  9. Sharma, C.M.; Chariar, V.M. Diagnosis of mental disorders using machine learning: Literature review and bibliometric mapping from 2012 to 2023. Heliyon 2024, 10, e32548. [Google Scholar] [CrossRef] [PubMed]
  10. Khan, S.; Umar Saeed, S.M.; Frnda, J.; Arsalan, A.; Amin, R.; Gantassi, R.; Noorani, S.H.; Nisar, H. A machine learning based depression screening framework using temporal domain features of the electroencephalography signals. PLoS ONE 2024, 19, e0299127. [Google Scholar] [CrossRef]
  11. Alazzawı, A.; Aljumaili, S.; Duru, A.D.; Uçan, O.N.; Bayat, O.; Coelho, P.J.; Pires, I.M. Schizophrenia diagnosis based on diverse epoch size resting-state EEG using machine learning. PeerJ Comput. Sci. 2024, 10, e2170. [Google Scholar] [CrossRef]
  12. Yamunarani, T.; Zaki, W.S.B.W.; Elango, S.; Chandrasekaran, G.; Yamuna, S.; Kumar, K.K.; Maheswaran, S. Analysis of Post-Traumatic Stress Disorder (PTSD) Using Electroencephalography. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
  13. Huang, S.-M.; Wu, F.-H.; Ma, K.-J.; Wang, J.-Y. Individual and integrated indexes of inflammation predicting the risks of mental disorders-statistical analysis and artificial neural network. BMC Psychiatry 2025, 25, 226. [Google Scholar] [CrossRef]
  14. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
  15. Bagheri, S.; Taridashti, S.; Farahani, H.; Watson, P.; Rezvani, E. Multilayer perceptron modeling for social dysfunction prediction based on general health factors in an Iranian women sample. Front. Psychiatry 2023, 14, 1283095. [Google Scholar] [CrossRef]
  16. Mohamed, E.S.; Naqishbandi, T.A.; Bukhari, S.A.C.; Rauf, I.; Sawrikar, V.; Hussain, A. A hybrid mental health prediction model using Support Vector Machine, Multilayer Perceptron, and Random Forest algorithms. Healthc. Anal. 2023, 3, 100185. [Google Scholar] [CrossRef]
  17. Saranya, S.; Kavitha, N. Robust feature selection with chicken swarm intelligence improved multilayer perceptron for early detection of mental illness disorder. J. Theor. Appl. Inf. Technol. 2022, 100, 3337–3345. [Google Scholar]
  18. Oweimieotu, A.E.; Akazue, M.I.; Edje, A.E. Designing a Hybrid Genetic Algorithm Trained Feedforward Neural Network for Mental Health Disorder Detection. Adv. Multidiscip. Sci. Res. J. Publ. 2024, 12, 49–62. [Google Scholar] [CrossRef]
  19. Uyulan, C.; Ergüzel, T.T.; Unubol, H.; Cebi, M.; Sayar, G.H.; Asad, M.N.; Tarhan, N. Major depressive disorder classification based on different convolutional neural network models: Deep learning approach. Clin. EEG Neurosci. 2021, 52, 38–51. [Google Scholar] [CrossRef]
  20. Wang, Y.; Gong, N.; Fu, C. Major depression disorder diagnosis and analysis based on structural magnetic resonance imaging and deep learning. J. Integr. Neurosci. 2021, 20, 977–984. [Google Scholar] [CrossRef]
  21. Wang, D.; Hong, D.; Wu, Q. Attention deficit hyperactivity disorder classification based on deep learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 20, 1581–1586. [Google Scholar] [CrossRef]
  22. Akbugday, B.; Bozbas, O.A.; Cura, O.K.; Pehlivan, S.; Akan, A. Detection of Attention Deficit Hyperactivity Disorder by Using EEG Feature Maps and Deep Learning. In Proceedings of the 2023 31st European Signal Processing Conference (EUSIPCO), Helsinki, Finland, 4–8 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1105–1109. [Google Scholar]
  23. Ismail, N.H.; Liu, N.; Du, M.; He, Z.; Hu, X. A deep learning approach for identifying cancer survivors living with post-traumatic stress disorder on Twitter. BMC Med. Inform. Decis. Mak. 2020, 20, 1–11. [Google Scholar] [CrossRef] [PubMed]
  24. Phan, D.V.; Yang, N.P.; Kuo, C.Y.; Chan, C.-L. Deep learning approaches for sleep disorder prediction in an asthma cohort. J. Asthma 2021, 58, 903–911. [Google Scholar] [CrossRef]
  25. Huang, P. A mental disorder prediction model with the ability of deep information expression using convolution neural networks technology. Sci. Program. 2022, 2022, 4664102. [Google Scholar] [CrossRef]
  26. Vaswani, A. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
  27. Xia, M.; Zhang, Y.; Wu, Y.; Wang, X. An end-to-end deep learning model for EEG-based major depressive disorder classification. IEEE Access 2023, 11, 41337–41347. [Google Scholar] [CrossRef]
  28. Aina, J.; Akinniyi, O.; Rahman, M.M.; Odero-Marah, V.; Khalifa, F. A hybrid Learning-Architecture for mental disorder detection using emotion recognition. IEEE Access 2024, 12, 91410–91425. [Google Scholar] [CrossRef] [PubMed]
  29. McDonald, R.G.; Cargill, M.I.; Khawar, S.; Kang, E. Emotion dysregulation in autism: A meta-analysis. Autism 2024, 28, 2986–3001. [Google Scholar] [CrossRef] [PubMed]
  30. Gilbert, M.; Boecker, M.; Reiss, F.; Kaman, A.; Erhart, M.; Schlack, R.; Westenhöfer, J.; Döpfner, M.; Ravens-Sieberer, U. Gender and age differences in ADHD symptoms and co-occurring depression and anxiety symptoms among children and adolescents in the BELLA study. Child Psychiatry Hum. Dev. 2023, 56, 1162–1172. [Google Scholar] [CrossRef]
  31. Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; p. 30. [Google Scholar]
  32. Park, S.M.; Jeong, B.; Oh, D.Y.; Choi, C.-H.; Jung, H.Y.; Lee, J.-Y.; Lee, D.; Choi, J.-S. Identification of major psychiatric disorders from resting-state electroencephalography using a machine learning approach. Front. Psychiatry 2021, 12, 707581. [Google Scholar] [CrossRef] [PubMed]
  33. Frances, A.; First, M.B.; Pincus, H.A. DSM-IV Guidebook; American Psychiatric Press: Washington, DC, USA, 1995. [Google Scholar]
  34. Gramfort, A.; Luessi, M.; Larson, E.; Engemann, D.A.; Strohmeier, D.; Brodbeck, C.; Goj, R.; Jas, M.; Brooks, T.; Parkkonen, L.; et al. MEG and EEG data analysis with MNE-Python. Front. Neurosci. 2013, 7, 00267. [Google Scholar] [CrossRef]
  35. Sharma, C.M.; Thein, K.Y.M.; Chariar, V.M. Optimized Support Vector Machines for Detection of Mental Disorders. In Artificial Intelligence in Healthcare; CRC Press: Boca Raton, FL, USA, 2024; pp. 190–219. [Google Scholar]
  36. Adeniji, O.D.; Adeyemi, S.O.; Ajagbe, S.A. An improved bagging ensemble in predicting mental disorder using hybridized random forest-artificial neural network model. Informatica 2022, 46, 543–550. [Google Scholar] [CrossRef]
  37. Zhu, W.; Shen, S.; Zhang, Z. Improved multiclassification of schizophrenia based on xgboost and information fusion for small datasets. Comput. Math. Methods Med. 2022, 2022, 1581958. [Google Scholar] [CrossRef]
  38. Wang, J.; Wang, Z.; Li, J.; Peng, Y. An interpretable depression prediction model for the elderly based on ISSA optimized LightGBM. J. Beijing Inst. Technol. 2023, 32, 168–180. [Google Scholar]
  39. Cai, H.; Yuan, Z.; Gao, Y.; Sun, S.; Li, N.; Tian, F.; Xiao, H.; Li, J.; Yang, Z.; Li, X.; et al. A multi-modal open dataset for mental-disorder analysis. Sci. Data 2022, 9, 178. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Overall architecture design of the DBN.
Figure 1. Overall architecture design of the DBN.
Information 16 00755 g001
Figure 2. Introduction to the dataset preprocessing process.
Figure 2. Introduction to the dataset preprocessing process.
Information 16 00755 g002
Figure 3. Mean value of power spectral density in each frequency band.
Figure 3. Mean value of power spectral density in each frequency band.
Information 16 00755 g003
Figure 4. Detection results of the six main MDs.
Figure 4. Detection results of the six main MDs.
Information 16 00755 g004
Figure 5. Detection results for the nine subcategories.
Figure 5. Detection results for the nine subcategories.
Information 16 00755 g005aInformation 16 00755 g005b
Figure 6. Comparison of recognition performance of the six main MDs.
Figure 6. Comparison of recognition performance of the six main MDs.
Information 16 00755 g006
Figure 7. Comparison of recognition performance of the nine subcategories.
Figure 7. Comparison of recognition performance of the nine subcategories.
Information 16 00755 g007aInformation 16 00755 g007b
Figure 8. Bee colony diagrams of SHAP analysis of the six main categories.
Figure 8. Bee colony diagrams of SHAP analysis of the six main categories.
Information 16 00755 g008
Figure 9. Bee colony diagrams of SHAP analysis of the nine subcategories.
Figure 9. Bee colony diagrams of SHAP analysis of the nine subcategories.
Information 16 00755 g009
Table 1. Description of the composition of the experimental dataset.
Table 1. Description of the composition of the experimental dataset.
Main CategoriesNo. of PatientsSubcategoriesNo. of Patients
Addictive Disorder (AddD)186Alcohol Use Disorder (AUD)93
Behavioral Addiction Disorder (BAD)93
AnxD107Panic Disorder (PD)59
Social Anxiety Disorder (SAD)48
Healthy Control (HC)95
Mood Disorder (MD)266Bipolar Disorder (BD)67
Depressive Disorder (DD)199
OCD46
SCZ117
Trauma- and Stress-Related Disorder (TSRD)128Acute Stress Disorder (ASD)38
Adjustment Disorder (AjD)38
PTSD52
Table 2. Details of experimental settings.
Table 2. Details of experimental settings.
Parameter ItemsValuesParameter ItemsValues
Batch size4Max epoch500
Min epoch20Dropout0.1
Learning rate1e-5OptimizerAdam
Patience0.02GPUNVIDIA GeForce Mx450
Python3.8.0CUDA12.0
Torch1.12.0Operating SystemWindows 11
Table 3. Detection results of the six main MDs.
Table 3. Detection results of the six main MDs.
CategoriesOur ModelNo Dynamic RoutingNo MLPNo CNN
A c c F 1 macro A c c F 1 macro A c c F 1 macro A c c F 1 macro
AddD0.820.780.820.800.840.820.770.72
AnxD0.780.770.730.700.780.770.750.75
MD0.850.800.860.800.820.740.810.74
OCD0.860.820.750.700.750.720.820.77
SCZ0.820.820.800.800.800.800.750.73
TSRD0.840.840.840.840.770.750.890.88
Avg.0.830.810.800.770.790.770.800.77
Table 4. Detection results of the nine subcategories.
Table 4. Detection results of the nine subcategories.
CategoriesOur ModelNo Dynamic RoutingNo MLPNo CNN
A c c F 1 macro A c c F 1 macro A c c F 1 macro A c c F 1 macro
ASD0.880.840.880.850.790.740.920.90
AjD0.830.780.790.760.880.820.880.82
AUD0.810.800.810.810.720.720.750.75
BAD0.750.740.750.750.670.660.750.75
BD0.840.830.840.840.750.750.750.74
DD0.770.740.840.800.820.780.730.68
PD0.860.840.820.820.790.770.640.63
PTSD0.890.890.860.840.790.780.790.78
SAD0.890.870.860.840.820.790.750.72
Avg.0.840.810.830.810.780.760.770.75
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, L.; Cui, H.; Peng, Y. DBN: A Dual-Branch Network for Detecting Multiple Categories of Mental Disorders. Information 2025, 16, 755. https://doi.org/10.3390/info16090755

AMA Style

Zhang L, Cui H, Peng Y. DBN: A Dual-Branch Network for Detecting Multiple Categories of Mental Disorders. Information. 2025; 16(9):755. https://doi.org/10.3390/info16090755

Chicago/Turabian Style

Zhang, Longhao, Hongzhen Cui, and Yunfeng Peng. 2025. "DBN: A Dual-Branch Network for Detecting Multiple Categories of Mental Disorders" Information 16, no. 9: 755. https://doi.org/10.3390/info16090755

APA Style

Zhang, L., Cui, H., & Peng, Y. (2025). DBN: A Dual-Branch Network for Detecting Multiple Categories of Mental Disorders. Information, 16(9), 755. https://doi.org/10.3390/info16090755

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop