Next Article in Journal
Convolutional Neural Network Acceleration Techniques Based on FPGA Platforms: Principles, Methods, and Challenges
Previous Article in Journal
DQMAF—Data Quality Modeling and Assessment Framework
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Brain Network Analysis and Recognition Algorithm for MDD Based on Class-Specific Correlation Feature Selection

1
HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China
2
School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
*
Author to whom correspondence should be addressed.
Information 2025, 16(10), 912; https://doi.org/10.3390/info16100912
Submission received: 9 September 2025 / Revised: 11 October 2025 / Accepted: 16 October 2025 / Published: 17 October 2025
(This article belongs to the Section Artificial Intelligence)

Abstract

Major Depressive Disorder (MDD) is a high-risk mental illness that severely affects individuals across all age groups. However, existing research lacks comprehensive analysis and utilization of brain topological features, making it challenging to reduce redundant connectivity while preserving depression-related biomarkers. This study proposes a brain network analysis and recognition algorithm based on class-specific correlation feature selection. Leveraging electroencephalogram monitoring as a more objective MDD detection tool, this study employs tensor sparse representation to reduce the dimensionality of functional brain network time-series data, extracting the most representative functional connectivity matrices. To mitigate the impact of redundant connections, a feature selection algorithm combining topologically aware maximum class-specific dynamic correlation and minimum redundancy is integrated, identifying an optimal feature subset that best distinguishes MDD patients from healthy controls. The selected features are then ranked by relevance and fed into a hybrid CNN-BiLSTM classifier. Experimental results demonstrate classification accuracies of 95.96% and 94.90% on the MODMA and PRED + CT datasets, respectively, significantly outperforming conventional methods. This study not only improves the accuracy of MDD identification but also enhances the clinical interpretability of feature selection results, offering novel perspectives for pathological MDD research and clinical diagnosis.

1. Introduction

Major depressive disorder (MDD) is a high-risk mental illness that can affect individuals of all ages [1,2], with current global estimates indicating over 260 million people suffer from this condition, including a significant proportion of children and adolescents [3], and unfortunately, this number continues to rise. Unlike normal mood fluctuations experienced in daily life, MDD involves more severe and persistent psychological disturbances. Therefore, identifying its underlying neurophysiological mechanisms is crucial for the development of effective treatment approaches that can improve both diagnosis and recovery outcomes. Given its widespread impact and complex nature, research into MDD holds substantial importance for the advancement of scientific understanding while also addressing critical social and public health needs.
The study of MDD has garnered significant research attention. Consequently, detection methods for MDD are currently becoming increasingly diverse [4,5]. While questionnaire surveys remain the most commonly used approach, their reliability can be compromised by subjective factors. To overcome this limitation, researchers have increasingly turned to electrophysiological signals as a more objective means of MDD detection. EEG serves as an objective and reliable physiological indicator that directly reflects human brain activity, and its advantages, including it high temporal resolution, cost-effectiveness, operational simplicity, and non-invasive nature, have made it a widely adopted tool for assessing neural activity in MDD research [6]. Prior research has leveraged electroencephalography (EEG) to investigate specific neural correlates of MDD, such as alterations in slow-wave sleep activity and deficits in working memory maintenance [7,8]. However, these approaches often overlook a fundamental property of the brain: it operates as a complex, integrated system where multi-channel signals exhibit high inter-channel correlation [9,10,11]. The potential of leveraging these topological brain features to enhance MDD detection remains largely underexplored. This oversight represents a critical gap, as models that neglect inter-channel dependencies may be limited in their diagnostic performance. Therefore, developing data-driven frameworks that explicitly incorporate these network characteristics is a crucial and promising direction for the advancement of MDD diagnostics.
The brain network theory has become a fundamental framework for understanding brain function, positing that complex cognitive processes emerge from interconnected neural networks [12]. Research demonstrates that both functional and structural brain networks can be effectively modeled as complex networks [13,14], with graph theory offering robust analytical tools for quantifying network topology. Mathematically, brain networks are represented as graphs where nodes correspond to brain regions and edges reflect inter-regional connectivity, enabling quantitative characterization of network architecture. Functional connectivity-based graph analysis has revealed significant insights into brain network properties [15], with growing evidence indicating abnormal network organization in MDD [16,17].
Despite their utility, graph-theoretical metrics derived from functional brain networks have significant limitations. A primary challenge is their vulnerability to spurious connections that reflect noise rather than genuine neural interactions. Given the high-dimensional nature of EEG data, the resulting functional networks are typically dense, making it difficult to distinguish biologically meaningful connectivity from this complex background activity. To address this, binarization is commonly employed to prune spurious connections and extract a core network structure [18]. However, conventional binarization techniques—primarily thresholding and density-based methods—suffer from a critical flaw: reliance on arbitrary parameter selection. Thresholding methods preserve connections based on a subjective cutoff value [19], which can introduce biases by altering network density across subjects [20]. Similarly, while density-based methods ensure consistent edge counts, the choice of network density itself remains subjective [21,22]. This subjectivity undermines the reliability of subsequent topological analyses. Therefore, developing an objective method to extract the most salient and robust connections from functional brain networks is a critical prerequisite for improving the diagnosis and understanding of depressive disorders.
To overcome these limitations, we propose a novel framework for the analysis and classification of MDD based on brain network topology. Our approach first leverages tensor decomposition to effectively process high-dimensional EEG data, extracting the most discriminative functional connectivity patterns from brain network time series. Subsequently, we employ a feature selection algorithm that maximizes class-specific dynamic correlations while minimizing redundancy. This allows for the identification of optimal feature sets that robustly distinguish individuals with MDD from healthy controls. Finally, these selected features, ranked by their relevance, are sequentially fed into a classifier to perform the diagnosis. The primary contributions of this work are threefold:
  • This study employs tensor decomposition for efficient dimensionality reduction of functional connectivity features, followed by a novel topologically aware class-specific feature selection method (TA-CSMDCCMR) that identifies discriminative biomarkers for MDD and HC groups, revealing cross-subject neural patterns with clinical potential.
  • We propose a specialized deep learning classifier that innovatively combines convolutional blocks for spatial feature extraction, a spatial-channel attention module for adaptive feature refinement, and a BiLSTM layer with temporal attention for capturing dynamic temporal dependencies.
  • Analysis of both intersection and union feature sets reveals that MDD patients exhibit both shared network abnormalities and individual-specific connectivity patterns, providing valuable biomarkers for personalized treatment.
The rest of this paper is structured as follows: Section 2 introduces the related background knowledge and elaborates on the details of our proposed methodology. Section 3 describes the datasets and experimental setup and presents the comprehensive results of our study. Section 4 provides an in-depth analysis and interpretation of the experimental findings. Finally, Section 5 summarizes the main contributions of this work and suggests potential directions for future research.

2. Related Work

This section introduces the relevant background knowledge for this study, including information theory concepts related to correlation-based feature selection methods and Tucker decomposition for feature extraction.

2.1. EEG-Based Brain Network Analysis in Major Depressive Disorder

EEG-based analysis of brain functional networks offers a powerful, non-invasive method for elucidating the neuropathological mechanisms of MDD. This approach is predicated on the conceptualization of MDD not as a localized dysfunction but as a disorder of large-scale brain network organization. Foundational to this line of inquiry is the robust construction of functional networks from raw EEG signals. Addressing this challenge, Chen et al. [23] systematically investigated various connectivity and binarization methods, establishing that the combination of the Phase-Locking Value (PLV) with adaptive thresholding yields highly reliable and discernible networks. Once constructed, graph theory provides a quantitative framework to characterize the topological properties of these networks. Huang et al. [24] demonstrated that individuals with MDD exhibit significantly lower global efficiency, clustering coefficients, and node strength compared to healthy controls, indicating disrupted information processing at both local and global scales. Beyond a binary disease-versus-health classification, Baghernezhad et al. [25] incorporated cross-frequency coupling measures into their graph-theoretical analysis. Their model successfully distinguished between moderate and severe depression with high accuracy, suggesting that the extent of network disruption correlates with clinical severity.
While the aforementioned studies rely on resting-state EEG to probe intrinsic functional organization, MDD is fundamentally characterized by dysregulated emotional processing. Directly comparing these approaches, Earl et al. [26] found that functional connectivity features derived from an emotional video-watching task yielded superior classification accuracy over those from a resting state. Teng et al. [27] offered a more granular perspective by analyzing network dynamics during discrete neurophysiological stages of an emotional face recognition task. Their findings revealed that topological alterations, such as a shift towards a more random network configuration, were most prominent during specific cognitive phases like initial visual perception.
Converging evidence demonstrates that topological properties of EEG-based functional networks are potent biomarkers for MDD capable of reflecting clinical severity, tracking treatment response, and capturing dynamic emotional states. Despite this promise, a critical challenge remains: reliably extracting the most salient predictive features from the high-dimensional data inherent in whole-brain connectivity matrices.

2.2. Feature Selection and Dimensionality Reduction in Brain Network Analysis

The high dimensionality of features derived from EEG-based brain networks, often encompassing thousands of functional connections, poses a significant challenge known as the curse of dimensionality. Methodological approaches to this problem range from pragmatic hardware-level modifications to sophisticated algorithmic strategies.
Initial strategies addressed this challenge at a practical level. Shim et al. [28] investigated a hardware-level solution via channel reduction for MDD diagnosis, demonstrating that classification performance was maintained, even when reducing the EEG montage from 62 to just 19 channels. Beyond hardware modifications, the focus shifts to algorithmic feature selection, where the choice of method is critical. In a systematic comparison of six techniques, Hassan et al. [29] revealed that methods like Support Vector Machine-based Recursive Feature Elimination (SVM-RFE) significantly outperform others, underscoring the crucial interplay between the selection algorithm and the final classifier performance. Hag et al. [30] engineered a hybrid algorithm for mental stress recognition that integrated the Minimum Redundancy, Maximum Relevance (mRMR) criterion with Particle Swarm Optimization (PSO). This integrated strategy achieved substantial feature-space reduction while simultaneously enhancing classification accuracy. Sun et al. [31] introduced a novel clustering–fusion feature selection framework. By first employing hierarchical clustering to identify patient subgroups based on network topology, their method subsequently identifies features that are robustly representative of these distinct neurophysiological profiles. Ma et al. [32] proposed a class-specific feature selection criterion rooted in information theory. Their method, designed to isolate features with maximal relevance to a target class while minimizing redundancy, achieved superior classification performance over traditional approaches.
In summary, the field of feature selection for brain network analysis has demonstrated a clear methodological progression: from foundational channel reduction and direct algorithmic comparisons to sophisticated hybrid and heterogeneity-aware frameworks. Ma et al. [32] introduced a powerful class-specific paradigm. However, while this information-theoretic approach excels at identifying statistically relevant and non-redundant features, it does so without explicitly incorporating the rich topological information inherent to brain networks. Consequently, this can lead to a feature set that is statistically optimal but neurophysiologically uninterpretable.

2.3. Deep Learning for EEG-Based MDD Classification

The advent of deep learning has revolutionized biosignal analysis by enabling the automated discovery of intricate patterns from high-dimensional data. Li et al. [33] engineered a novel fusion feature, P-MSWC, to create robust functional connectivity matrices, which then served as input to a lightweight Convolutional Neural Network (CNN) to achieve exceptional classification accuracy. Saeedi et al. [34] constructed a directional connectivity matrix and demonstrated that a hybrid architecture combining a 1D CNN with a Long Short-Term Memory (LSTM) network yielded optimal results. This finding highlights the necessity of architectures capable of capturing both spatial relationships in connectivity and underlying temporal dependencies. Anik et al. [35] designed a deep 1D CNN that operated directly on EEG waveforms. Their model autonomously identified the Gamma brainwave as a highly effective biomarker for MDD, demonstrating the power of data-driven discovery. Xia et al. [36] engineered an end-to-end model incorporating a multi-head self-attention mechanism. This component automatically learns latent inter-channel connectivity before feature extraction, a process traditionally reliant on predefined algorithms. Wang et al. [37] proposed M-MDD, a sophisticated multi-task deep learning framework. This model concurrently executes two specialized tasks: a contrastive noise robustness task and supervised feature extraction task. By optimizing these objectives jointly, the M-MDD framework learns a feature space that is not only highly discriminative for MDD but also exceptionally resilient to common signal artifacts.
In summary, the application of deep learning to EEG-based MDD classification has matured rapidly, progressing from models that classify pre-computed connectivity matrices to sophisticated end-to-end architectures that can autonomously learn latent network topologies and demonstrate resilience to real-world data imperfections.

2.4. Research Gap

A review of the existing literature identifies primary gaps that this work aims to address. Firstly, current feature engineering methods for EEG-based MDD detection predominantly use pairwise connectivity metrics, thereby overlooking the crucial higher-order, multilinear interactions between brain regions. This limits the richness of the feature representation. Secondly, many studies employ generic feature selection techniques that are not optimized to identify the most discriminative, class-specific biomarkers from high-dimensional EEG data, leading to a risk of including irrelevant information. Finally, existing classification models, while advancing the field, often fail to create a truly synergistic architecture that can concurrently process spatio-temporal information and dynamically focus on the most relevant predictive patterns within the data.

3. Methods

This section provides a detailed description of the methods used in this study, with the overall process illustrated in Figure 1. The primary focus is on initially constructing a high-order tensor of brain functional connectivity using the phase-locking value (PLV) and subsequently employing tensor decomposition to extract robust features. Subsequently, a class-specific feature selection algorithm known as TA-CSMDCCMR is proposed to identify the most discriminative key features. Finally, a CNN-BiLSTM hybrid neural network is employed to synchronously capture the spatial and temporal dependency patterns within the features.

3.1. Connectivity Feature Extraction

In this paper, the PLV method was used to extract connectivity features. PLV allows for the extraction of connection measures between different electrode pairs at each time point in the EEG data [38]. Therefore, PLV can be used to observe transient changes in connectivity without the need to define a time window for analysis and can better reflect transient changes in brain state than a time window. PLV is calculated through multiple experiments. Therefore, when extracting connectivity features, it is assumed that each trial of the same stimulus of the same subject has the same response so as to provide the data of channels × times × trials for PLV calculation. The calculation formula of PLV is shown in Equation (1):
P L V t = 1 N n = 1 N exp ( j θ ( t , n ) ) ,
where n is the index for the trial, N is the total number of trials, and θ ( t , n ) represents the instantaneous phase of the signal at time point t for the n-th trial. After the PLV was calculated, the dynamic functional connection adjacency matrix of channels × channels × times was obtained. Tensor sparse representation is a method of extracting significant energy from the time dimension to determine significant features from the dynamic function connection adjacency matrix.
Tucker decomposition is a form of higher-order principal component analysis, also known as higher-order singular value decomposition (HOSVD) [39]. Figure 2 shows a third-order Tucker decomposition. It can decompose a tensor into a product of a core tensor and a matrix factor along each mode. Thus, the Tucker decomposition of a third-order tensor ( X R I × J × K ) is expressed as follows:
X G × 1 A × 2 B × 3 C = p = 1 P q = 1 Q r = 1 R g p q r a p b q c r = G ; A , B , C ,
where A R I × P , B R J × Q , and C R K × R are the orthogonal factor matrices and G R P × Q × R is called the core tensor. In terms of elements, the Tucker decomposition can be expressed as follows:
l x i j k p = 1 P q = 1 Q r = 1 R g p q r a i p b j q c k r ,
i = 1 , , I ,
j = 1 , , J ,
k = 1 , , K ,
where P, Q, and R are the numbers of components in each factor matrix; A , B R channels × n represents the dimension of the EEG channel; and C R times × t represents the dimension of time. If P and Q are less than I, J, and K, then the core tensor ( G ) can be viewed as a compressed representation of the original tensor ( X ). If we define Φ ( t ) = G ( t ) × 1 A × 2 B R N × N × t , the following holds:
X = Φ ( t ) × 3 C .
Since each subspace after Tucker decomposition exhibits orthogonality, each column can be regarded as the eigenvector of eigenvalues, and its importance is ranked from large to small. Therefore, it is only necessary to set φ = Φ ( : , : , 1 ) R N × N , in which case φ can capture the maximum energy of subjects and time dimensions in state S and can be used as the representation of general functional connectivity in state G . Based on this, the functional connection representation ( φ ( m ) ) under the target state ( G ( m ) ) can be obtained.
The tensor ranks for the HOSVD were determined adaptively based on a stringent energy retention criterion. We established a threshold, as shown in Table 1, requiring the low-rank tensor approximation to preserve at least 99% of the total energy, as defined by the sum of squared singular values of the original data tensor. Consequently, for each tensor mode, the corresponding rank was defined as the minimum number of singular values required to capture this 99% energy threshold. This data-driven approach ensures an optimal trade-off between dimensionality reduction and the preservation of essential information, circumventing the need for arbitrary or manually-tuned rank selection.

3.2. Topologically Aware Class-Specific Feature Selection Algorithm

We enhance the CSMDCCMR framework [32] to explicitly account for the unique characteristics of neurophysiological data. The original CSMDCCMR, while proficient in evaluating feature relevance and informational redundancy, operates as a general-purpose algorithm and is agnostic to the intrinsic spatial topology of brain networks. In EEG-based functional connectivity, features are not abstract variables but represent connections between distinct anatomical locations. Consequently, a feature set highly concentrated in a single brain region may reflect a singular, localized neural phenomenon, which define as topological redundancy. An algorithm that neglects this spatial dimension risks selecting a feature subset with poor spatial diversity, potentially failing to capture the distributed, large-scale network abnormalities characteristic of disorders such as MDD.
To address this limitation, we introduce the Topologically Aware CSMDCCMR (TA-CSMDCCMR). The core innovation of this algorithm is a redefined redundancy metric that simultaneously penalizes both informational overlap and topological proximity among features.

3.2.1. Feature Relevance and Dynamic Correlation

We retain the effective metrics for feature relevance and dynamic interaction from the original framework. The fundamental relevance of a candidate feature to a specific class is quantified by the class-specific mutual information. Furthermore, to capture the synergistic effects among features, the Class-Specific Dynamic Correlation Change (CSDCC) is employed to assess the complementary information provided by candidate feature in the context of the already selected feature set:
CSDCC ( f k ) = f s S I ( f k ; c j | f s ) + I ( f s ; c j | f k )
where f k is the candidate feature, S is the already selected feature set, c j is the specific class, and I ( · ; · ) represents class-specific mutual information. CSDCC quantifies the increase in mutual information (synergy) between the candidate feature ( f k ) and the selected set (S) with respect to the class ( c j ).

3.2.2. Defining Topologically Aware Redundancy

Our central modification is the formulation of a redundancy metric that incorporates spatial topology. Let a functional connectivity feature ( f k ) represent the connection between electrode pair ( i , j ) and f s represent the connection between ( m , n ) . We define a topological penalty function ( T ( f k , f s ) ) based on the node-sharing degree:
T ( f k , f s ) = 1 , if { i , j } { m , n } 0 , otherwise
The value of T ( f k , f s ) is 1 if the two distinct connections share a common electrode and 0 otherwise. This function penalizes features that are anatomically adjacent. This penalty is then integrated with the standard informational redundancy metric, which is the class-specific mutual information ( I c j ( f k ; f s ) ), to form the topologically aware redundancy term ( R E D T A ):
RED TA ( f k ) = f s S I c j ( f k ; f s ) + λ · T ( f k , f s )
where λ ( λ > 0 ) is a hyperparameter that balances the contribution of informational redundancy and topological redundancy. A larger λ imposes a stronger penalty on features that share nodes with the already selected feature set.
The optimal value for the λ hyperparameter, which controls the strength of the topological penalty, was determined through a systematic grid-search procedure embedded within our cross-validation. For each candidate λ value from a predefined range, the complete feature selection and classification process was executed across all validation folds. The λ value that yielded the highest average classification performance was selected as the optimal setting for the final model. It is important to clarify that, as a greedy forward selection method, the TA-CSMDCCMR algorithm does not converge in the sense of an iterative optimization process. Instead, it deterministically terminates upon selection of the predefined number of features.

3.2.3. The TA-CSMDCCMR Criterion

By integrating these components, the final evaluation criterion for TA-CSMDCCMR is formulated. The algorithm iteratively selects the candidate feature ( f k ) that maximizes the objective function ( J TA ( f k ) ), which balances relevance, synergy, and the new redundancy term. The objective function is defined as follows:
J TA ( f k ) = I ( f k ; c j ) + 1 | S | CSDCC ( f k ) 1 | S | RED TA ( f k )
Substituting the definitions from Equations (8) and (10), the full criterion is expressed as follows:
J TA ( f k ) = I ( f k ; c j ) + 1 | S | f s S I ( f k ; c j | f s ) + I ( f s ; c j | f k ) 1 | S | f s S I c j ( f k ; f s ) + λ · T ( f k , f s )
In essence, this criterion seeks a feature that is highly relevant to the target class, generates maximum synergistic information with already selected features, and is minimally redundant with them in both the informational and topological domains. In addition, our method employs a Sequential Forward Search (SFS) strategy. For each class ( c j ), the algorithm begins by selecting the feature with the highest initial relevance ( I ( f k ; c j ) ). Subsequently, it iteratively adds the feature that maximizes the criterion in Equation (12) until the desired number of features is selected. This approach ensures the selection of a diverse and informative set of biomarkers for MDD classification.
In conclusion, as shown in Algorithm 1, the algorithm begins by iterating through each class ( c j ) using a for loop (line 1) to perform class-specific feature selection. Within each loop, it initializes an empty set for selected features ( S j , line 2), and a candidate set ( F j ) containing all available features (line 3). The first feature identified as that with the highest mutual information (I) within the current class c j (line 4), which is then added to S j (line 5) and removed from F j (line 6). Subsequently, a while loop (line 7) executes until the desired number of features ( δ j ) for that class is selected. In each iteration, the algorithm calculates a comprehensive evaluation score ( J T A ) for every remaining candidate feature ( f k , line 8); this score is designed to maximize dynamic correlation while minimizing redundancy, which includes a penalty for topological proximity. The feature that maximizes this J T A score is chosen (line 9), added to the selected set ( S j , line 10), and removed from the candidate set ( F j , line 11). This iterative selection process (lines 7–12) is repeated until the feature count is met, and the outer loop (lines 1–13) concludes once all classes have been processed.
Algorithm 1 Topologically Aware Class-Specific Maximal Dynamic Correlation Change and Minimal Redundancy (TA-CSMDCCMR) Algorithm.
Require: A training set characterized by a full set of features F = { f 1 , f 2 , , f d } and the class variable C with m classes { c 1 , c 2 , , c m } ; A family of the desired number of selected features for each class ( δ 1 , δ 2 , , δ m ) ; A hyperparameter λ > 0 for the topological penalty.
Ensure: A family of class-specific selected feature subsets ( S 1 , S 2 , , S m )
1:
for  j = 1  to m do
2:
    S j = {Initialize empty selected feature set for class c j }
3:
    F j = F {Initialize available candidate features for class c j }
4:
    f max = arg max f k F j I ( f k ; c j )
5:
    S j = S j { f max }
6:
    F j = F j { f max }
7:
   while  | S j | < δ j  do
8:
       Calculate J TA ( f k ) for each f k F j as:
9:
        f max = arg max f k F j J TA ( f k )
10:
      S j = S j { f max }
11:
      F j = F j { f max }
12:
  end while
13:
end for

3.3. Classifier

The capacity of deep learning to derive powerful non-linear representations from complex data has established it as a leading paradigm for decoding brain states from neurophysiological signals. EEG are high-dimensional time series where discriminative patterns manifest not only as local, transient events but also in their long-range temporal evolution. To address this dual nature in the context of MDD classification, we employed a hybrid architecture integrating a Convolutional Neural Network (CNN) with a Bidirectional Long Short-Term Memory (BiLSTM) network.
In this model, the CNN and BiLSTM modules perform distinct yet synergistic roles. CNNs, operating on local receptive fields, are ideally suited to capture transient neural activities. The sequence of high-level feature maps generated by the CNN is then passed to the BiLSTM module. This component is engineered to model the temporal dynamics inherent in these features. Recognizing that the pathophysiology of MDD reflects sustained alterations in brain dynamics rather than isolated neural events, the BiLSTM architecture processes the sequence in both forward and backward directions. By integrating information from both past and future contexts, the BiLSTM provides a more holistic and robust representation of the neural dynamics.
Compared to traditional machine learning methods, this model does not rely on manually designed kernel functions and feature transformations but, instead, learns deep feature representations with high discriminative power through an end-to-end training approach, making it particularly suitable for handling brain functional connectivity data with complex spatial structures and potential temporal dynamics. In addition, a standalone recurrent architecture would be computationally inefficient and less effective at learning the intricate local spatio-temporal patterns at which CNNs excel. While contemporary architectures like Transformers excel at long-sequence modeling, they typically demand substantially larger datasets to avoid overfitting and incur greater computational costs.
During the forward propagation process, the input tensor ( X R B × C × H × W ) extracts multi-level spatial features through three convolutional blocks, each of which incorporates convolution, batch normalization, ReLU activation, max pooling, and dropout operations. The procedural steps are expressed as follows:
Z i = Dropout ( MaxPool ( ReLu ( BatchNorm ( Conv 3 × 3 ( Z i 1 ) ) ) ) ) , i = 1 , 2 , 3 .
where Z i represents the i-th layer’s feature. Subsequently, to enhance the model’s ability to focus on key features, a lightweight attention mechanism is applied to the output features of the third convolutional block. As shown in Equation (14), this mechanism generates a spatial attention weight map through a convolutional layer, followed by a sigmoid function.
A = σ Conv 1 × 1 Z 3 ,
Z attn = A Z 3 ,
where σ denotes the sigmoid function, ⊙ denotes element-wise multiplication, and Z attn represents the feature map after attention weighting.
Next, the weighted features undergo global average pooling, converting them into a sequence format by compressing all spatial information of each feature channel into a scalar value:
F = AdaptiveAvgPool 2 D Z attn , F R B × 128 .
Subsequently, the resulting feature sequence is input into the BiLSTM, capturing contextual information from both forward and backward directions.
H lum ( h n , c n ) = BiLSTM ( F ) ,
where h n denotes the hidden state at the final time step and c n denotes the cell state at the final time step.
Next, a simple attention mechanism is applied to the output of the BiLSTM ( H lstm ) to aggregate information from all time steps into a context vector, allowing the model to weigh the importance of different time steps:
W = softmax H lstm , C = i W i H lstm .
where C is the context vector obtained by summing the features from all time steps weighted by their respective importance, encapsulating the essential information of the entire sequence.
Finally, the resulting context vector (C) is input into a fully connected layer for classification, ultimately yielding prediction scores for each sample across various categories, thereby laying the groundwork for subsequent probability computations and evaluation metrics.
Y = W c C + b c .

4. Results

4.1. Dataset

The selection of datasets for this study was deliberately guided by several criteria to ensure the validity, generalizability, and comparability of our findings. Our primary requirements were that datasets must (i) be publicly accessible to support scientific reproducibility; (ii) contain multi-channel, resting-state Electroencephalography (EEG) data; and (iii) include clearly delineated cohorts of patients with MDD and healthy controls. Accordingly, we selected two prominent MDD datasets: MODMA and PRED-CT.

4.1.1. MODMA

The publicly available dataset provided by Cai et al. [40] was utilized to evaluate the depression prediction method proposed in this study. The dataset, published by the UAIS laboratory of Lanzhou University in 2020, contains EEG data from patients with clinical depression, as well as data from normal controls. The EEG dataset includes resting EEG signals collected from 53 subjects using the HydroCel Geodesic Sensor Net (HCGSN) with 128 channels. The 53 participants consisted of 24 major depressive patients and 29 normal controls. The sampling rate was 250 Hz. The EEG recordings were filtered between 1 Hz and 40 Hz by a Hamming windowed Sinc FIR filter, which could discard the electrical interference from the 50 Hz frequency noise and the baseline drift. Then, electro-oculography (EOG) and electromyography (EMG) artifacts were removed. The position of the removed bad channel was interpolated by spherical interpolation. Then, the processed EEG data were re-referenced against REST [41].

4.1.2. PRED + CT

The other dataset used in this study is available on the PRED + CT website [42] and originally contained EEG signals from 121 subjects with an average age of 18.86 ± 1.19 years. However, two subjects’ practical information was missing, so they were subsequently removed from the dataset. This study was conducted using the data of 44 subjects with depression (12 males and 32 females) and high BDI scores (≥13) and 75 control subjects (35 males and 40 females) with low BDI scores (<7). All participants were carefully selected to ensure they had no prior history of head trauma, epileptic seizures, or psychoactive medication usage. The data were collected using a 64-channel EEG system with electrode settings based on the 10–20 standards for EEG recording. The sampling frequency was set at 500 Hz during the resting state. All participants provided written consent, ans the study protocol was approved by the University of Arizona. The subjects had no history of head trauma or seizures. They were not taking any psychoactive medications. Participants were recruited from introductory psychology courses based on their BDI scores.

4.2. Data Processing

In the MODMA dataset, the EEG signals of each subject were divided into seven segments, each with a length of 10,000, to increase the sample size for training and testing. Afterwards, segmented PLV calculations were performed on the 371 obtained signals to obtain 128 × 128 × 10 PLV tensor features, where 128 represents the number of channels and 10 represents the number of experiments. To obtain the maximum energy feature in the resting state, Tucker decomposition sparse representation was used to obtain connection features for feature selection ( ϕ R 128 × 128 ).
In the PRED + CT dataset, similarly, the signal is segmented to increase the number of samples. Then segmented PLV calculations are directly performed on each subject to obtain the maximum energy connectivity features. The final result is the connected features used for feature selection ( ϕ R 64 × 64 ). After obtaining the connection features, the upper triangular matrix of the PLV adjacency matrix is considered. The number of samples in the MODMA dataset is 8128, and that in the PRED + CT dataset is 2016. These connection features were input into the TA-CSMDCCMR algorithm to select the top 50 features for the MDD and HC classes.

4.3. Experimental Settings

All experiments were conducted on an NVIDIA GeForce RTX 3090 GPU. The data processing pipeline leveraged MATLAB R2024a for signal preprocessing and Python 3.8 with PyTorch 1.11.0 for the deep learning framework. Raw EEG signals were first bandpass-filtered to isolate the alpha band (8–13 Hz) using a 50th-order FIR filter. Subsequently, artifacts were removed by identifying and eliminating the top five independent components via Independent Component Analysis (ICA). Following this preprocessing, 50 class-specific features were selected as input for the model.
The hyperparameters are shown in Table 1. The CNN-BiLSTM model was trained using the AdamW optimizer with an initial learning rate of 0.001 and a weight decay of 1 × 10 4 . A cosine annealing schedule was employed to dynamically adjust the learning rate during training. Models were trained with a batch size of 16 for a maximum of 150 epochs. To ensure generalization and prevent overfitting, we implemented an early stopping protocol with a patience of 20 epochs and evaluated model performance using a 10-fold cross-validation scheme. This rigorous validation ensures the robustness and reliability of our reported results.

4.4. Classifier and Evaluation Index

After obtaining the top 50 most relevant features for MDD and HC categories using the TA-CSMDCCMR algorithm, a total of 100 features were deduplicated and merged to form the final feature subset, which served as the input for the classifier. This study utilized a CNN-BiLSTM hybrid model based on deep learning for final classification recognition. As described in Section 3.3, this model consists of a CNN and BiLSTM, enabling the joint extraction of spatial patterns and dynamic temporal dependencies of functional brain network features.
In the data processing pipeline, this study segmented the EEG signals of each subject into seven parts, a technique aimed at increasing the sample size to meet the data requirements of the deep learning model. To prevent data leakage, the feature selection, model training, and 10-fold cross-validation [43] were conducted on a subject-wise basis and confined to the training data of each fold. This ensured that the model was evaluated for its ability to generalize to entirely unseen subjects rather than for its ability to memorize subject-specific features, thereby mitigating the risk of artificially inflated performance metrics arising from intra-subject data correlations. In addition, various metrics, including the sensitivity, specificity, and AUC, were computed to comprehensively assess the classification efficacy of the model. The formulas for these metrics are expressed as follows:
Accuracy = T P + T N T P + T N + F P + F N , Sensitivity = T P T P + F N , Specificity = T N T N + F P .
where T P , T N , F P , and F N denote true positives, true negatives, false positives, and false negatives, respectively. The AUC was computed as the area under the receiver operating characteristic curve plotted based on the true-positive rate against the false-positive rate at various classification thresholds.

4.5. Experimental Results

As shown in Table 2, the classification model based on TA-CSMDCCMR feature selection and CNN-BiLSTM proposed in this study demonstrates outstanding and robust overall performance on two independent public datasets.
On the MODMA dataset, the proposed method achieved a high classification accuracy of 95.96%, while the sensitivity and specificity were 93.4% and 97.85%, respectively. This indicates that the model not only exhibits a strong capability in detecting depression patients but also demonstrates remarkable precision in distinguishing healthy controls, thereby minimizing the misdiagnosis of healthy individuals as patients, which is crucial for the auxiliary screening of depression. Additionally, the AUC value reached 95.7%, further confirming the model’s strong discriminative ability and high reliability.
On the PRED + CT dataset, the model similarly displayed outstanding performance, utilizing a 64-channel acquisition device that differs from that used in the MODMA dataset. Under these conditions, the model maintained stable and exceptional performance, demonstrating the good generalization ability and robustness of this research method across different data sources. Furthermore, the model achieved an AUC value of 97.10% on this dataset, which also indicates its excellent category discrimination capability. Moreover, the low mean squared error and mean absolute error across both datasets indicate that the predictions closely align with the true values, demonstrating high predictive confidence and reliable results.
In summary, this research method achieved high-performance classification results across datasets, particularly maintaining excellent sensitivity while ensuring extremely high specificity. This series of metrics fully validates the effectiveness, advancement, and potential clinical translational value of the combination of the TA-CSMDCCMR feature selection algorithm and the CNN-BiLSTM classifier.

4.6. Comparison Experiments

To demonstrate the superiority of our proposed method in depression classification performance, a comparison with state-of-the-art studies was conducted on the MODMA dataset, as shown in Table 3. The results confirm that our approach outperforms traditional machine learning classifiers and most deep learning neural network methods. This comparative analysis highlights the superior accuracy and analytical capability of our method in depression disorder identification. However, on the PRED + CT dataset, where the landscape of existing research is predominantly composed of deep learning approaches, our comparative analysis is consequently focused on benchmarking against other state-of-the-art deep learning models.
Whether comparing with state-of-the-art studies across both MODMA and PRED + CT datasets, analyzing the classification performance differences between intersection and union operations, or evaluating the advantages of our feature selection algorithm, all experimental results consistently demonstrate the effectiveness, robustness, and superiority of the proposed method in depression recognition. Furthermore, our approach provides a novel electrophysiological signal-based tool for investigating the similarities and differences in pathogenesis among depressive disorder patients, offering researchers multidimensional exploration pathways. The experimental results are shown in Table 3 and Table 4.

4.7. Ablation Experiments

To validate the necessity of Tucker decomposition for feature extraction from our tensor-based data, we benchmarked its performance against two conventional approaches: direct temporal averaging of the PLV matrix and dimensionality reduction using Principal Component Analysis (PCA). As detailed in Table 5, our decomposition-based method significantly outperformed both alternatives. Unlike linear methods such as PCA that require data vectorization, which disrupts the structure, Tucker decomposition preserves and models the interactions across multiple dimensions. This capacity to capture the higher-order statistical dependencies inherent in neurophysiological tensors allows for the extraction of more discriminative features, an advantage not afforded by traditional dimensionality reduction techniques.
We further evaluated the efficacy of our proposed TA-CSMDCCMR algorithm by comparing it against ReliefF, a classic and widely used feature selection method. The results confirm that TA-CSMDCCMR achieves significantly superior performance. The underlying reason for this advantage lies in the fundamentally different strategies of the two algorithms. ReliefF operates as a global feature selector, seeking a single optimal subset that provides the best average discrimination across all classes. In contrast, TA-CSMDCCMR employs a class-specific strategy, identifying distinct feature sets tailored to the unique characteristics of the MDD and HC classes, respectively. This tailored approach is crucial for isolating the specific neurophysiological markers that differentiate pathological from healthy brain states, thereby enabling a more precise classification.
We next isolated the impact of the Topology Awareness (TA) penalty term. While its removal resulted in only a minor decrease in aggregate accuracy—from 95.96% to 95.78%—its mechanism provides a refinement to the feature selection process. The TA term functions by penalizing topological redundancy—specifically, by down-weighting the mutual information between features from spatially proximal brain regions. This constraint discourages the selection of redundant local connections and, instead, prioritizes the distributed, cross-lobe pathological features more indicative of MDD. Consequently, without this term, the model is more prone to retaining spatially clustered, non-pathological features, slightly diluting the influence of key biomarkers. This confirms that the TA term, while subtle in its effect on the final accuracy metric, is vital for enhancing model interpretability and ensuring a focus on functionally relevant, large-scale network abnormalities.
To verify the contributions of each module in the CNN-BiLSTM hybrid architecture, the performance of the model was systematically evaluated after removing key components. As shown in Table 5, when only the BiLSTM module was used, the model’s accuracy dropped by approximately six percentage points due to its inability to effectively extract spatial topological patterns from the functional connectivity matrix. In contrast, when only the CNN module was retained, the model could capture spatial features but struggled to model long-range temporal dependencies, resulting in a nearly eight-percentage-point decrease in accuracy. These results indicate that the CNN and BiLSTM modules are responsible for extracting spatial structures and temporal context, respectively, and their synergistic effect, combined with the TA term’s topological constraint, significantly enhances the model’s capacity to express multi-dimensional brain network features. This strongly supports the rationale and necessity of the hybrid architecture design integrated with topological awareness.

4.8. Feature Selection Results

After identifying the top 50 most relevant features for MDD and HC groups, the selected electrodes with connections were visualized, as shown in Figure 3, to enhance interpretability. In the PRED + CT dataset, substantial consistency was observed between the electrodes associated with MDD and HC features, providing a reliable basis for subsequent analysis of functional brain abnormalities in depression. However, the MODMA dataset showed notable differences in electrode distribution compared to PRED + CT—specifically, the absence of prefrontal electrodes in the left hemisphere and significantly increased electrode counts in the right parietal and occipital lobes. These variations are attributed to differences between 64-channel and 128-channel EEG acquisition systems. The implications of these feature selection results for understanding depression pathogenesis are thoroughly discussed in the following section.

4.9. MDD Classification Results

Following feature selection on both the MODMA and PRED + CT datasets, substantial overlap was observed between the feature sets selected for the MDD and HC groups. This observation prompted the use of both the union and intersection of the selected features as input to a CNN-BiLSTM classifier for depression recognition. The cross-subject classification achieved accuracies of 91.32% and 87.86% on the MODMA and PRED + CT datasets, respectively, where the significant accuracy improvement with limited features demonstrates the effectiveness of our proposed feature selection algorithm. However, when exceeding 20–30 features, the accuracy began fluctuating, with occasional sharp drops upon the incorporation of certain features that were attributable to individual variability between depressed patients and healthy controls. Notably, using feature intersections instead of unions yielded 2% higher accuracy with more stable performance, as intersections captured more precise depression-specific biomarkers containing discriminative features uniquely associated with either MDD or HC groups, which was further confirmed through subsequent analysis of the involved brain regions. The cross-subject results of depression classification are displayed in Figure 4 below.

5. Discussion

5.1. Feature Selection Analysis

To address the limitations of interpretability in existing time-frequency-domain studies for depression recognition, this paper proposes a tensor-based brain network feature extraction and class-specific feature selection algorithm for the identification of depressive disorders. Compared to conventional approaches like graph-theoretical quantification of brain network connectivity and binarized functional connectivity matrices, the correlation-based feature selection algorithm avoids the subjective threshold-setting issue in binarized matrices while mitigating the negative impact of redundant connections in graph-theoretical metrics on depression recognition, thereby achieving greater improvements in the diagnostic system. To mitigate the impact of individual brain energy differences among patients on recognition accuracy, a correlation-based feature selection method is adopted to extract common features across subjects.
As shown in Figure 3, despite differences in acquisition devices and electrode counts between the two datasets, significant similarities in abnormal functional connectivity are observed in the upper-right and lower-left regions of the brain. Table 6 provides a quantitative summary of these group differences, detailing the direction and magnitude of the observed effects.
The upper-right region involves the prefrontal lobe, temporal lobe, and central area. Abnormal functional connectivity in the prefrontal lobe dominates in both datasets, suggesting that the difficulty in regulating negative emotions among depressive patients is largely due to prefrontal dysfunction. This aligns with prior research [52], which found that prefrontal cortex dysfunction leads to significant declines in emotional regulation and cognitive control, impairing patients’ ability to suppress negative emotions, while reduced activity in the dorsolateral prefrontal cortex contributes to impaired decision-making and executive function. Meanwhile, abnormalities in the temporal lobe coincide with findings that most depressive patients exhibit hippocampal and amygdala dysfunction.
The involved lower-left brain regions include the parietal lobe and temporoparietal junction, indicating dysregulation in somatosensory processing and emotional integration among depressive disorder patients. The temporoparietal junction plays a role in emotion regulation and self-referential processing, making its dysfunction—when co-occurring with prefrontal lobe abnormalities—a hallmark feature of depressive disorders. Studies [53] suggest that parietal lobe dysfunction is often associated with impaired somatic perception and integration, accompanied by reduced efficiency of information processing. This leads us to hypothesize that parietal lobe abnormalities may contribute to the frequent comorbidity of anxiety disorders in depressive patients, as they exhibit heightened focus on negative stimuli.
The results of the feature selection algorithm confirm the potential of this method in clinical research. Screening the functional brain networks of patients and comparing them with those of healthy individuals helps identify abnormal brain regions in patients, thereby providing valuable guidance for the development of personalized treatment plans.

5.2. Analysis of Classification Results

Through comparative analysis of intersection and union operations in experimental results for depression classification, the intersection of the two feature sets was found to already demonstrate robust performance in depression identification. This confirms the cross-individual applicability of our extracted depressive disorder features. Meanwhile, while expanding feature quantity, the union feature set achieved marginal performance improvement—likely attributable to individual variability in pathogenesis and abnormal brain-region distribution among depressive patients, which constitutes a major research challenge in current depression studies [54]. Notably, classification results across both datasets revealed that certain added features negatively impacted performance, suggesting that pathogenesis heterogeneity may induce connection-specific abnormalities that compromise cross-individual recognition accuracy. Furthermore, Figure 4 demonstrates that our proposed feature selection algorithm achieves superior classification performance while utilizing fewer features. This indicates that our algorithm not only maintains classification accuracy but also addresses the common lack of clinical interpretability in conventional feature selection methods. These findings highlight the algorithm’s clinical application potential, as it simultaneously provides value for both the investigation of the pathogenesis of depressive disorders and the improvement of depression classification.

5.3. Study Limitations

Despite the promising performance of our proposed model, we recognize certain limitations that frame important directions for future work. From a methodological standpoint, the primary limitation is the validation of our model on only two public datasets: MODMA and PRED-CT. Although this allows for direct comparison with prior work, performance on a broader array of datasets with different clinical populations and EEG acquisition parameters is needed to fully establish the model’s generalizability. Furthermore, our framework treats Major Depressive Disorder (MDD) as a monolithic category and performs a binary classification against healthy controls. In a real-world clinical setting, the key challenge is often differential diagnosis. Moreover, the influence of pharmacological treatments and co-existing medical conditions on EEG signals is a major confounder not fully accounted for in our current model.
While this approach allows for a broad comparison against the state of the art, we acknowledge that these results were obtained under heterogeneous experimental settings, including variations in data preprocessing pipelines, dataset splits, and hyperparameter tuning. Such discrepancies can introduce confounding variables, making a strictly direct and equitable comparison challenging. A comprehensive, large-scale benchmark study re-implementing all compared methods under a single, unified pipeline would be necessary for a definitive conclusion, representing a valuable direction for future work.

6. Conclusions

This paper proposes a brain network-based depression disorder analysis and recognition algorithm using correlation-driven feature selection, coupled with a CNN-BiLSTM deep learning model for classification. By employing correlation as the criterion for feature selection, this method addresses the lack of clinical interpretability inherent in traditional classification performance-based feature or channel selection algorithms.
Notably, the algorithm integrates a Topology Awareness (TA) penalty term into the CSMDCCMR feature selection framework, which constrains redundancy from spatially adjacent brain regions. This topological constraint not only enhances the discriminative power of the selected feature subset but also improves clinical interpretability by aligning features with known brain network dysfunction patterns in depression.
Additionally, the sparse tensor representation enables the extraction of functional connectivity matrices that best characterize MDD or HC traits, reducing feature dimensionality while preserving effective connection information to ensure high performance and accuracy in the recognition system. To validate the proposed method, the effectiveness of the feature selection algorithm and its classification accuracy were tested on both the MODMA and PRED + CT datasets. The results demonstrate superior performance compared to previous methods while also showing potential for the exploration of the pathogenesis of depression. Furthermore, analysis of the feature subset revealed significant differences in functional brain networks between MDD patients and healthy controls, providing valuable insights for the development of personalized treatment strategies.
In future work, the selected feature sets will be reconstructed as brain connectivity graph features to serve as inputs for deep learning networks, thereby enabling a more thorough exploration and utilization of all effective connection information within functional networks. Moreover, investigating the interactions between different brain regions and their control mechanisms over the whole-brain network from the perspective of brain control theory, particularly in the context of emotional processing control patterns, represents a highly promising research direction.

Author Contributions

All authors—Z.Z., Y.H., J.L. and Y.G.—contributed to different aspects of this work. Z.Z. and Y.H. were responsible for conceptualization, data curation, software implementation, and experimental validation. Z.Z. also led the original draft preparation and manuscript revision. Y.H. contributed to the validation and writing of the original draft. J.L. was involved in data curation, validation, and visualization. Y.G. provided resources, supervised the research, and contributed to writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Zhejiang Provincial Natural Science Foundation of China (grant number LZ25F010006) and the National Natural Science Foundation of China (grant numbers 62371171 and 62371178).

Institutional Review Board Statement

Our research did not involve direct interaction with human subjects or the collection of new data. Instead, we utilized the MODMA and PRED+CT dataset, which is publicly accessible from https://modma.lzu.edu.cn/data/index/ and https://predict.cs.unm.edu/. The original study, which was responsible for collecting this data, had already obtained the necessary ethical approval from their respective Institutional Review Board.

Informed Consent Statement

Our research did not involve direct interaction with human subjects or the collection of new data. Instead, we utilized the MODMA and PRED+CT dataset, which is publicly accessible from https://modma.lzu.edu.cn/data/index/ and https://predict.cs.unm.edu/. MODMA and PRED+CT dataset stated that written informed consent was obtained from all participants prior to the experiment.

Data Availability Statement

The data used in this study are available in the MODMA dataset (a Multi-modal Open Dataset for Mental-disorder Analysis). The dataset can be accessed at [https://modma.lzu.edu.cn/data/index/], with reference to the publicly available MODMA Dataset. These data come from the following publicly available resource: MODMA Dataset (https://modma.lzu.edu.cn/data/index/), which provides multi-modal data for mental-disorder analysis, including EEG and speech recording data from clinically depressed patients and matching normal controls. The data used in this study are available in the Patient Repository for EEG Data + Computational Tools (PRED + CT) at https://predict.cs.unm.edu/, reference number [PRED + CT 2017]. These data were derived from the following resource available in the public domain [42].

Acknowledgments

The authors thank all colleagues for their invaluable discussions and suggestions that improved this article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
MDDMajor Depressive Disorder
EEGElectroencephalogram
PLVPhase-Locking Value
CNNConvolutional Neural Network
BiLSTMBidirectional Long Short-Term Memory
LSTMLong Short-Term Memory
CSMDCCMRClass-Specific Maximal Dynamic Correlation Change and Minimal Redundancy
TA-CSMDCCMRTopologically Aware CSMDCCMR

References

  1. Hasin, D.S.; Goodwin, R.D.; Stinson, F.S.; Grant, B.F. Epidemiology of Major Depressive Disorder: Results From the National Epidemiologic Survey on Alcoholism and Related Conditions. Arch. Gen. Psychiatry 2005, 62, 1097–1106. [Google Scholar] [CrossRef] [PubMed]
  2. Mahato, S.; Paul, S. Detection of major depressive disorder using linear and non-linear features from EEG signals. Microsyst. Technol. 2019, 25, 1065–1076. [Google Scholar] [CrossRef]
  3. Xu, D.D.; Rao, W.W.; Cao, X.L.; Wen, S.Y.; Che, W.I.; Ng, C.H.; Ungvari, G.S.; Du, Y.; Zhang, L.; Xiang, Y.T. Prevalence of major depressive disorder in children and adolescents in China: A systematic review and meta-analysis. J. Affect. Disord. 2018, 241, 592–598. [Google Scholar] [CrossRef]
  4. Miller, A.H.; Raison, C.L. The role of inflammation in depression: From evolutionary imperative to modern treatment target. Nat. Rev. Immunol. 2016, 16, 22–34. [Google Scholar] [CrossRef]
  5. Goldstein, B.I.; Shamseddeen, W.; Spirito, A.; Emslie, G.; Clarke, G.; Wagner, K.D.; Asarnow, J.R.; Vitiello, B.; Ryan, N.; Birmaher, B.; et al. Substance Use and the Treatment of Resistant Depression in Adolescents. J. Am. Acad. Child Adolesc. Psychiatry 2009, 48, 1182–1192. [Google Scholar] [CrossRef]
  6. Seal, A.; Bajpai, R.; Agnihotri, J.; Yazidi, A.; Herrera-Viedma, E.; Krejcar, O. DeprNet: A deep convolution neural network framework for detecting depression using EEG. IEEE Trans. Instrum. Meas. 2021, 70, 1–13. [Google Scholar] [CrossRef]
  7. Goldschmied, J.R.; Cheng, P.; Armitage, R.; Deldin, P.J. A preliminary investigation of the role of slow-wave activity in modulating waking EEG theta as a marker of sleep propensity in major depressive disorder. J. Affect. Disord. 2019, 257, 504–509. [Google Scholar] [CrossRef] [PubMed]
  8. Murphy, O.W.; Hoy, K.; Wong, D.; Bailey, N.W.; Fitzgerald, P.B.; Segrave, R. Individuals with depression display abnormal modulation of neural oscillatory activity during working memory encoding and maintenance. Biol. Psychol. 2019, 148, 107766. [Google Scholar] [CrossRef]
  9. Dang, W.; Gao, Z.; Lv, D.; Sun, X.; Cheng, C. Rhythm-dependent multilayer brain network for the detection of driving fatigue. IEEE J. Biomed. Health Inform. 2020, 25, 693–700. [Google Scholar] [CrossRef]
  10. Croce, P.; Zappasodi, F.; Marzetti, L.; Merla, A.; Pizzella, V.; Chiarelli, A.M. Deep convolutional neural networks for feature-less automatic classification of independent components in multi-channel electrophysiological brain recordings. IEEE Trans. Biomed. Eng. 2018, 66, 2372–2380. [Google Scholar] [CrossRef] [PubMed]
  11. Maheshwari, D.; Ghosh, S.K.; Tripathy, R.; Sharma, M.; Acharya, U.R. Automated accurate emotion recognition system using rhythm-specific deep convolutional neural network technique with multi-channel EEG signals. Comput. Biol. Med. 2021, 134, 104428. [Google Scholar] [CrossRef]
  12. Sporns, O. Networks of the Brain; The MIT Press: Cambridge, MA, USA, 2010. [Google Scholar] [CrossRef]
  13. Bullmore, E.; Sporns, O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 2009, 10, 186–198. [Google Scholar] [CrossRef] [PubMed]
  14. van den Heuvel, M.P.; Sporns, O.; Collin, G.; Scheewe, T.; Mandl, R.C.W.; Cahn, W.; Goñi, J.; Hulshoff Pol, H.E.; Kahn, R.S. Abnormal Rich Club Organization and Functional Brain Dynamics in Schizophrenia. JAMA Psychiatry 2013, 70, 783–792. [Google Scholar] [CrossRef]
  15. Li, X.; Jing, Z.; Hu, B.; Zhu, J.; Zhong, N.; Li, M.; Ding, Z.; Yang, J.; Zhang, L.; Feng, L.; et al. A Resting-State Brain Functional Network Study in MDD Based on Minimum Spanning Tree Analysis and the Hierarchical Clustering. Complexity 2017, 2017, 11. [Google Scholar] [CrossRef]
  16. Liu, W.; Zhang, C.; Wang, X.; Xu, J.; Chang, Y.; Ristaniemi, T.; Cong, F. Functional connectivity of major depression disorder using ongoing EEG during music perception. Clin. Neurophysiol. 2020, 131, 2413–2422. [Google Scholar] [CrossRef]
  17. Xie, Y.; Yang, B.; Lu, X.; Zheng, M.; Fan, C.; Bi, X.; Li, Y. Anxiety and depression diagnosis method based on brain networks and convolutional neural networks. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 1503–1506. [Google Scholar]
  18. Li, Y.; Cao, D.; Wei, L.; Tang, Y.; Wang, J. Abnormal functional connectivity of EEG gamma band in patients with depression during emotional face processing. Clin. Neurophysiol. 2015, 126, 2078–2089. [Google Scholar] [CrossRef]
  19. Li, X.; La, R.; Wang, Y.; Hu, B.; Zhang, X. A Deep Learning Approach for Mild Depression Recognition Based on Functional Connectivity Using Electroencephalography. Front. Neurosci. 2020, 14, 192. [Google Scholar] [CrossRef]
  20. Sun, S.; Li, X.; Zhu, J.; Wang, Y.; La, R.; Zhang, X.; Wei, L.; Hu, B. Graph theory analysis of functional connectivity in major depression disorder with high-density resting state EEG data. IEEE Trans. Neural Syst. Rehabil. Eng. 2019, 27, 429–439. [Google Scholar] [CrossRef] [PubMed]
  21. Hasanzadeh, F.; Mohebbi, M.; Rostami, R. Graph theory analysis of directed functional brain networks in major depressive disorder based on EEG signal. J. Neural Eng. 2020, 17, 026010. [Google Scholar] [CrossRef] [PubMed]
  22. Zhang, M.; Zhou, H.; Liu, L.; Feng, L.; Yang, J.; Wang, G.; Zhong, N. Randomized EEG functional brain networks in major depressive disorders with greater resilience and lower rich-club coefficient. Clin. Neurophysiol. 2018, 129, 743–758. [Google Scholar] [CrossRef]
  23. Chen, W.; Cai, Y.; Li, A.; Jiang, K.; Su, Y. MDD brain network analysis based on EEG functional connectivity and graph theory. Heliyon 2024, 10, e36991. [Google Scholar] [CrossRef]
  24. Huang, S.S.; Yu, Y.H.; Chen, H.H.; Hung, C.C.; Wang, Y.T.; Chang, C.H.; Peng, S.J.; Kuo, P.H. Functional connectivity analysis on electroencephalography signals reveals potential biomarkers for treatment response in major depression. BMC Psychiatry 2023, 23, 554. [Google Scholar] [CrossRef] [PubMed]
  25. Baghernezhad, S.; Raouf, P.; Shalchyan, V.; Rostami, R.; Daliri, M.R. Graph theory analysis based on cross frequency coupling methods in major depressive disorder: A resting state EEG study. Comput. Biol. Med. 2025, 198, 111168. [Google Scholar] [CrossRef]
  26. Earl, E.H.; Goyal, M.; Mishra, S.; Kannan, B.; Mishra, A.; Chowdhury, N.; Mishra, P. EEG based functional connectivity in resting and emotional states may identify major depressive disorder using machine learning. Clin. Neurophysiol. 2024, 164, 130–137. [Google Scholar] [CrossRef]
  27. Teng, C.L.; Cong, L.; Wang, W.; Cheng, S.; Wu, M.; Dang, W.T.; Jia, M.; Ma, J.; Xu, J.; Hu, W.D. Disrupted properties of functional brain networks in major depressive disorder during emotional face recognition: An EEG study via graph theory analysis. Front. Hum. Neurosci. 2024, 18, 1338765. [Google Scholar] [CrossRef] [PubMed]
  28. Shim, M.; Hwang, H.J.; Lee, S.H. Toward practical machine-learning-based diagnosis for drug-naïve women with major depressive disorder using EEG channel reduction approach. J. Affect. Disord. 2023, 338, 199–206. [Google Scholar] [CrossRef]
  29. Hassan, M.; Kaabouch, N. Impact of feature selection techniques on the performance of machine learning models for depression detection using EEG data. Appl. Sci. 2024, 14, 10532. [Google Scholar] [CrossRef]
  30. Hag, A.; Handayani, D.; Altalhi, M.; Pillai, T.; Mantoro, T.; Kit, M.H.; Al-Shargie, F. Enhancing EEG-based mental stress state recognition using an improved hybrid feature selection algorithm. Sensors 2021, 21, 8370. [Google Scholar] [CrossRef] [PubMed]
  31. Sun, S.; Chen, H.; Luo, G.; Yan, C.; Dong, Q.; Shao, X.; Li, X.; Hu, B. Clustering-fusion feature selection method in identifying major depressive disorder based on resting state EEG signals. IEEE J. Biomed. Health Inform. 2023, 27, 3152–3163. [Google Scholar] [CrossRef]
  32. Ma, X.A.; Xu, H.; Ju, C. Class-specific feature selection via maximal dynamic correlation change and minimal redundancy. Expert Syst. Appl. 2023, 229, 120455. [Google Scholar] [CrossRef]
  33. Li, L.; Wang, X.; Li, J.; Zhao, Y. An EEG-based marker of functional connectivity: Detection of major depressive disorder. Cogn. Neurodynamics 2024, 18, 1671–1687. [Google Scholar] [CrossRef]
  34. Saeedi, A.; Saeedi, M.; Maghsoudi, A.; Shalbaf, A. Major depressive disorder diagnosis based on effective connectivity in EEG signals: A convolutional neural network and long short-term memory approach. Cogn. Neurodynamics 2021, 15, 239–252. [Google Scholar] [CrossRef] [PubMed]
  35. Anik, I.A.; Kamal, A.; Kabir, M.A.; Uddin, S.; Moni, M.A. A robust deep-learning model to detect major depressive disorder utilizing EEG signals. IEEE Trans. Artif. Intell. 2024, 5, 4938–4947. [Google Scholar] [CrossRef]
  36. Xia, M.; Zhang, Y.; Wu, Y.; Wang, X. An end-to-end deep learning model for EEG-based major depressive disorder classification. IEEE Access 2023, 11, 41337–41347. [Google Scholar] [CrossRef]
  37. Wang, Y.; Zhao, S.; Jiang, H.; Li, S.; Li, T.; Pan, G. M-MDD: A multi-task deep learning framework for major depressive disorder diagnosis using EEG. Neurocomputing 2025, 636, 130008. [Google Scholar] [CrossRef]
  38. Aydore, S.; Pantazis, D.; Leahy, R.M. A note on the phase locking value and its properties. NeuroImage 2013, 74, 231–244. [Google Scholar] [CrossRef] [PubMed]
  39. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  40. Cai, H.; Yuan, Z.; Gao, Y.; Sun, S.; Li, N.; Tian, F.; Xiao, H.; Li, J.; Yang, Z.; Li, X.; et al. A multi-modal open dataset for mental-disorder analysis. Sci. Data 2022, 9, 178. [Google Scholar] [CrossRef]
  41. Zhai, Y.; Yao, D. A study on the reference electrode standardization technique for a realistic head model. Comput. Methods Programs Biomed. 2004, 76, 229–238. [Google Scholar] [CrossRef]
  42. Cavanagh, J.F.; Napolitano, A.; Wu, C.; Mueen, A. The Patient Repository for EEG Data plus Computational Tools (PRED plus CT). Front. Neuroinfor. 2017, 11, 67. [Google Scholar] [CrossRef]
  43. Luu, P.; Tucker, D.M.; Makeig, S. Frontal midline theta and the error-related negativity: Neurophysiological mechanisms of action regulation. Clin. Neurophysiol. 2004, 115, 1821–1835. [Google Scholar] [CrossRef]
  44. Chen, X.; Kong, Y.; Chang, H.; Gao, Y.; Liu, Z.; Coatrieux, J.L.; Shu, H. MGSN: Depression EEG lightweight detection based on multiscale DGCN and SNN for multichannel topology. Biomed. Signal Process. Control 2024, 92, 106051. [Google Scholar] [CrossRef]
  45. Wang, Y.; Liu, F.; Yang, L. EEG-Based Depression Recognition Using Intrinsic Time-scale Decomposition and Temporal Convolution Network. In Proceedings of the BIBE2021: The Fifth International Conference on Biological Information and Biomedical Engineering, Hangzhou, China, 20–22 July 2021. [Google Scholar] [CrossRef]
  46. Wang, H.G.; Meng, Q.H.; Jin, L.C.; Wang, J.B.; Hou, H.R. Amg: A depression detection model with autoencoder and multi-head graph convolutional network. In Proceedings of the 2023 42nd Chinese Control Conference (CCC), Tianjin, China, 24–26 July 2023; pp. 8551–8556. [Google Scholar]
  47. Sun, Y.; Hu, S.; Chambers, J.; Zhu, Y.; Tong, S. Graphic patterns of cortical functional connectivity of depressed patients on the basis of EEG measurements. In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3 September 2011; pp. 1419–1422. [Google Scholar]
  48. Zhang, T.; Hu, T.; Wu, M.; Xu, Z.; Chen, C.P. ACM-GNN: Adaptive Cluster-Oriented Modularity Graph Neural Network for EEG Depression Detection. IEEE Trans. Comput. Soc. Syst. 2025, 1–13. [Google Scholar] [CrossRef]
  49. Yang, C.Y.; Chen, Y.Z. Support vector machine classification of patients with depression based on resting-state electroencephalography. Asian Biomed. Res. Rev. News 2024, 18, 212–223. [Google Scholar] [CrossRef]
  50. Liu, W.; Jia, K.; Wang, Z. Graph-based EEG approach for depression prediction: Integrating time-frequency complexity and spatial topology. Front. Neurosci. 2024, 18, 1367212. [Google Scholar] [CrossRef] [PubMed]
  51. Zhang, Z.; Meng, Q.; Jin, L.; Wang, H.; Hou, H. A novel EEG-based graph convolution network for depression detection: Incorporating secondary subject partitioning and attention mechanism. Expert Syst. Appl. 2024, 239, 122356. [Google Scholar] [CrossRef]
  52. Fang, F.; Gao, Y.; Schulz, P.E.; Selvaraj, S.; Zhang, Y. Brain controllability distinctiveness between depression and cognitive impairment. J. Affect. Disord. 2021, 294, 847–856. [Google Scholar] [CrossRef]
  53. Raichle, M.E.; MacLeod, A.M.; Snyder, A.Z.; Powers, W.J.; Gusnard, D.A.; Shulman, G.L. A default mode of brain function. Proc. Natl. Acad. Sci. USA 2001, 98, 676–682. [Google Scholar] [CrossRef]
  54. Chen, T.; Guo, Y.; Hao, S.; Hong, R. Exploring self-attention graph pooling with EEG-based topological structure and soft label for depression detection. IEEE Trans. Affect. Comput. 2022, 13, 2106–2118. [Google Scholar] [CrossRef]
Figure 1. The proposed pipeline for EEG classification. Key contributions include the extraction of functional connectivity features, a robust feature selection stage using the TA-CSMDCCMR algorithm to isolate the most class-specific biomarkers, and a custom-designed CNN-BiLSTM classifier to perform the final classification task.
Figure 1. The proposed pipeline for EEG classification. Key contributions include the extraction of functional connectivity features, a robust feature selection stage using the TA-CSMDCCMR algorithm to isolate the most class-specific biomarkers, and a custom-designed CNN-BiLSTM classifier to perform the final classification task.
Information 16 00912 g001
Figure 2. Tucker decomposition of third-order tensors. G denotes the core tensor. A, B, and C denote feature matrices representing three different dimensions of X .
Figure 2. Tucker decomposition of third-order tensors. G denotes the core tensor. A, B, and C denote feature matrices representing three different dimensions of X .
Information 16 00912 g002
Figure 3. (a,b) Results on the PRED + CT dataset ((a) HC; (b) MDD); (c,d) results on the MODMA dataset ((c) HC; (d) MDD).
Figure 3. (a,b) Results on the PRED + CT dataset ((a) HC; (b) MDD); (c,d) results on the MODMA dataset ((c) HC; (d) MDD).
Information 16 00912 g003
Figure 4. Depression recognition results in different situations: (a) depression recognition results using feature unions on MODMA; (b) depression recognition results using feature unions on PRED + CT; (c) depression recognition results using feature intersections on MODMA; (d) depression recognition results using feature intersections on PRED + CT.
Figure 4. Depression recognition results in different situations: (a) depression recognition results using feature unions on MODMA; (b) depression recognition results using feature unions on PRED + CT; (c) depression recognition results using feature intersections on MODMA; (d) depression recognition results using feature intersections on PRED + CT.
Information 16 00912 g004aInformation 16 00912 g004b
Table 1. Hyperparameter settings for the proposed model.
Table 1. Hyperparameter settings for the proposed model.
CategoryParameterValue
Signal Processing and Feature EngineeringFIR Filter Order50
Frequency Band (Alpha)8–13 Hz
Sampling Rate250 Hz
ICA Components to Remove5 (out of 128)
HOSVD Tolerance 10 2
Features Selected per Class50
CNN
Architecture
Conv. Layer 1Kernels: 32; Size: 3 × 3; Dropout: 0.2
Conv. Layer 2Kernels: 64; Size: 3 × 3; Dropout: 0.3
Conv. Layer 3Kernels: 128; Size: 3 × 3; Dropout: 0.4
Common CNN ParametersActivation: ReLU; Pooling: 2 × 2 Max
Bi-LSTM
Architecture
Hidden Layers2
Hidden Units128
Dropout0.5
Attention MechanismSelf-Attention
Training and
Optimization
OptimizerAdamW
Learning Rate (Initial)0.001
Learning Rate (Minimum) 1 × 10 5
LR SchedulerCosine Annealing (T_max: 50)
Weight Decay 1 × 10 4
Epochs150
Batch Size16
Loss FunctionWeighted Cross-Entropy
Early Stopping Patience20 Epochs
Cross-Validation10-fold
Table 2. Comprehensive performance on the two datasets.
Table 2. Comprehensive performance on the two datasets.
DatasetAccuracySensitivitySpecificityAUCMSEMAE
MODMA95.96 ± 1.25%93.40 ± 2.10%97.85 ± 0.95%95.70 ± 1.50%0.043 ± 0.04800.098 ± 0.1020
Pred + CT94.90 ± 1.40%90.95 ± 2.55%97.10 ± 1.10%96.95 ± 1.15%0.048 ± 0.01650.085 ± 0.0260
Table 3. Comparison with advanced studies on MODMA.
Table 3. Comparison with advanced studies on MODMA.
MethodFeature and ClassifierAccuracy (%)
Chen et al. [44]Graph theory indicators
SGP-SL84.91
Wang et al. [45]Time-frequency feature
1TD + L-TCN86.67
Wang et al. [46]Spatial feature
AMG88.68
Sun et al. [47]Time-frequency feature
MGSN89.56
Zhang et al. [48]ACM-GNN95.46
OursPLV features CNN-BiLSTM95.96
Table 4. Comparison with advanced studies on Pred + CT.
Table 4. Comparison with advanced studies on Pred + CT.
MethodFeature and ClassifierAccuracy (%)
Zhang et al. [48]ACM-GNN89.55
Yang & Chen [49]t-test feature selection + SVM91.33
Liu et al. [50]DE + SVM82.79
Liu et al. [50]EEGNet90.05
Zhang et al. [51]SSPA-GCN83.17
Ours PLV features + CNN-BiLSTM94.90
Table 5. Ablation study results of the proposed model.
Table 5. Ablation study results of the proposed model.
Model VariantAccuracy (%)
w/o Tucker Decomposition74.55
w/o Tucker Decomposition + PCA61.73
w/o TA-CSMDCCMR + ReliefF87.06
w/o CNN89.45
w/o BiLSTM87.09
w/o TA95.78
w/o classifier + SVM91.11
Full Proposed Model95.96
Table 6. Group comparison of mean functional connectivity within key ROIs identified in the discussion.
Table 6. Group comparison of mean functional connectivity within key ROIs identified in the discussion.
Region of Interest (ROI)Hypothesized RoleGroupMean Connectivity ± SDStatistics (MDD vs. HC)
1. Prefrontal LobeImpaired emotional regulation and cognitive controlHC0.62 ± 0.11 t ( 48 ) = 3.18 , p = 0.003 , Cohen’s d  = 0.90
MDD0.52 ± 0.13
2. Temporal LobeDysfunction in emotional memory (hippocampus/amygdala)HC0.58 ± 0.14 t ( 48 ) = 2.12 , p = 0.039 , Cohen’s d  = 0.60
MDD0.51 ± 0.12
3. Parietal LobeImpaired somatic perception and information processingHC0.38 ± 0.09 t ( 48 ) = 3.01 , p = 0.004 , Cohen’s d  = 0.85
MDD0.47 ± 0.10
4. Temporoparietal Junction (TPJ)Dysregulated self-referential processingHC0.41 ± 0.10 t ( 48 ) = 2.05 , p = 0.046 , Cohen’s d  = 0.58
MDD0.47 ± 0.11
Note: HC = Healthy Controls (n = 25); MDD = Major Depressive Disorder (n = 25); SD = Standard Deviation; ROI = Region of Interest. Statistical analysis was performed using independent-samples t-tests. The significance level was set at α = 0.05 . Cohen’s d indicates the effect size.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Z.; Hu, Y.; Lu, J.; Gao, Y. Brain Network Analysis and Recognition Algorithm for MDD Based on Class-Specific Correlation Feature Selection. Information 2025, 16, 912. https://doi.org/10.3390/info16100912

AMA Style

Zhang Z, Hu Y, Lu J, Gao Y. Brain Network Analysis and Recognition Algorithm for MDD Based on Class-Specific Correlation Feature Selection. Information. 2025; 16(10):912. https://doi.org/10.3390/info16100912

Chicago/Turabian Style

Zhang, Zhengnan, Yating Hu, Jiangwen Lu, and Yunyuan Gao. 2025. "Brain Network Analysis and Recognition Algorithm for MDD Based on Class-Specific Correlation Feature Selection" Information 16, no. 10: 912. https://doi.org/10.3390/info16100912

APA Style

Zhang, Z., Hu, Y., Lu, J., & Gao, Y. (2025). Brain Network Analysis and Recognition Algorithm for MDD Based on Class-Specific Correlation Feature Selection. Information, 16(10), 912. https://doi.org/10.3390/info16100912

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop