1. Introduction
Major depressive disorder (MDD) remains one of the most prevalent neuropsychiatric conditions globally, characterized by persistent low mood and significant cognitive deficits [
1]. With over 264 million individuals affected worldwide [
2,
3], MDD contributes significantly to functional impairment in daily and occupational contexts and represents a leading cause of disability [
4]. In total, it accounts for nearly one million suicide-related deaths annually. Current diagnostic protocols rely predominantly on clinician judgment and patient self-reports, contributing to substantial underdiagnosis, particularly in low- and middle-income countries, where an estimated 76–85% of cases go without proper treatment [
5]. These challenges highlight the urgent need for objective, biomarker-driven diagnostic tools to enable early intervention and alleviate pressures on mental healthcare systems. Neuroimaging-based classifiers have shown considerable potential in providing such tools, offering a scalable complement to clinical assessment.
Multimodal magnetic resonance imaging (MRI) has emerged as a key modality for identifying neurobiological alterations in MDD. Structural MRI (sMRI) reveals consistent abnormalities in cortical and subcortical morphology across multisite cohorts [
6,
7], while functional MRI (fMRI)—including both resting-state and task-based paradigms—has demonstrated dysregulation in large-scale neural networks [
8,
9,
10,
11,
12]. These findings suggest that integrating sMRI and fMRI biomarkers could significantly enhance the accuracy and generalizability of automated diagnostic systems.
A major obstacle in developing such systems is the scarcity of single-site datasets sufficient for training robust models. While the centralized aggregation of multisite data is a common solution [
13,
14,
15], it raises serious ethical and legal concerns regarding data privacy [
16], particularly given the sensitive nature of neuroimaging data [
17]. Federated learning (FL) provides a promising alternative by enabling decentralized model training without sharing raw data [
18,
19,
20]. Among FL algorithms, federated averaging (FedAvg) is widely adopted for its efficiency [
21], and preliminary applications to brain fMRI have demonstrated feasibility [
22]. As a privacy-preserving deep learning paradigm, federated learning aims to build a model or system that can utilize multisite information without raw data transfer.
However, domain shift may occur due to non-biological variability that can be attributed to differences in scanner manufacturers, non-standardized imaging acquisition protocols, and other intrinsic factors across different sites, resulting in both domain-invariant and domain-specific features across sites [
23,
24,
25]. There are different optimization directions for unique global models in different sites [
26]. A unique global model, such as one trained via standard FedAvg, is trained primarily based on global information and fails to account for these domain-specific characteristics, often leading to suboptimal performance. In contrast, personalized FL models allow for site-specific adaptation while still leveraging shared knowledge across sites [
27]. Our pF-GMCO explicitly aligns local and global representations through gradient matching and contrastive learning, thereby achieving a better trade-off between generalization and personalization. Thus, we prefer to establish the optimal pFL model rather than a unique global one.
To overcome these challenges, we propose a personalized federated learning algorithm, named Personalized Federated Gradient Matching and Contrastive Optimization (pF-GMCO), designed for multisite MDD classification using multimodal MRI. Our approach incorporates two novel technical components: (1) a gradient matching mechanism that adaptively weights contributions from different sites based on distribution similarity, effectively mitigating domain shifts, and (2) a model contrastive loss that promotes personalized model optimization tailored to each site’s data characteristics. Furthermore, we integrate multimodal compact bilinear pooling (MCB) [
28] to capture high-order interactions between sMRI-derived morphological features and fMRI-derived representations (ReHo and ALFF), significantly enhancing feature integration and the discriminative power.
We evaluated our method on the Rest-Meta-MDD consortium dataset, comprising 2293 subjects from 23 sites. Compared to existing FL strategies and fusion methods, our method achieved state-of-the-art performance, with average classification accuracy of 79.07%, outperforming existing FL approaches and alternative multimodal fusion strategies, alongside favorable interpretability and robustness. The contributions of this work are as follows:
We propose pF-GMCO, a personalized federated learning algorithm that combines adaptive gradient matching with contrastive optimization to learn site-specific models while leveraging cross-site knowledge in a privacy-preserving manner.
We introduce an MCB-based fusion module for integrating sMRI and fMRI features, significantly improving the classification performance compared to unimodal and conventional fusion approaches.
We demonstrate the clinical applicability and superior performance of our framework through extensive multisite experiments, with visualization results highlighting biologically plausible regions such as the default mode network and frontoparietal network, consistent with the established MDD pathophysiology.
2. Materials and Methods
2.1. Data and Preprocessing
Multimodal MRI data from the REST-Mate-MDD project were included in this study, consisting of multimodal MRI data from MDD patients and matched healthy controls (HCs) from 25 imaging sites
http://rfmri.org/REST-meta-MDD, (accessed on 1 January 2020). We used the data of 2293 subjects from 23 sites, including 1028 MDD patients and 1225 matched healthy controls (sites S4 and S19 were removed, as they exhibited some data overlap with other sites). Notably, the multimodal MRI data in the REST-meta-MDD project are publicly available, eliminating the need for ethical approval in our study. Considering the limited scale of some private datasets (such as #S5, #S12, etc.), it is difficult to optimize local models effectively. We randomly divided the 23 sites into five federated union clients, ensuring a similar range of subjects within each client. The details of the federated clients are shown in
Table 1. Specifically, the collected subjects were matched in terms of gender, age, and education. The patients were diagnosed as having MDD based on the ICD10 or DSM-IV. The subject information and the imaging acquisition parameters of each site can be seen on the project website.
In this study, we use the T1-weighted brain sMRI data and resting-state fMRI data for experiments. Specifically, the sMRI data cover the 3D space and are preprocessed using the CAT-12.8 toolbox (fil.ion.ucl.ac.uk/spm/software/). Then, considering that the rs-fMRI data cover the 4D space and are not aligned with the 3D feature space of the sMRI data, we further calculate the low-frequency oscillations (ALFF) and regional homogeneity (ReHo) from the preprocessed resting-state fMRI data. Specifically, the ALFF and ReHo metrics are widely adopted in resting-state fMRI studies to capture localized neural activity and synchronization, which are frequently disrupted in MDD. Specifically, ALFF quantifies the intensity of spontaneous low-frequency oscillations (0.01–0.1 Hz), reflecting baseline neural activity levels [
7,
29], while ReHo measures the temporal similarity of BOLD signals within a local neighborhood, serving as an indicator of regional functional coherence [
30]. Previous studies have demonstrated aberrant ALFF and ReHo patterns in MDD patients, underscoring their value as functional biomarkers of the disorder.
The projection of 4D rs-fMRI into a 3D space for ALFF and ReHo extraction provides two key advantages for multimodal fusion: (1) it preserves the critical spatial patterns of brain function while ensuring dimensional compatibility with sMRI features, which is a prerequisite for unified model design; (2) it establishes consistent latent granularity that allows structural and functional information to complement each other effectively, ultimately leading to more computationally efficient and discriminative joint representations for the diagnosis of MDD.
2.2. Method Overview
The proposed pFL algorithm mainly includes two updating steps for the training of personalized federated models. Specifically, we propose a novel adaptive aggregating mechanism, based on calculating the federated gradient matching loss. Then, we introduce gradient matching and the model contrastive loss for regularization to further alleviate domain shifts and enable personalized optimization. The pipeline of pF-GMCO can be seen in
Figure 1b, and the details are shown in Algorithm 1.
Algorithm 1 pFL -GMCO Algorithm |
- Require:
Private datasets in multiple sites, pFL models , number of epochs L, local learning rate , federated learning rate , variance of Gaussian noise, local learning rate , federated learning rate . - Ensure:
Optimal weights of pFL model in site ;
- 1:
for do - 2:
Local updating: - 3:
for do - 4:
Compute local classification loss: - 5:
Update pFL models locally: - 6:
end for - 7:
Federated adaptive aggregation: - 8:
for do - 9:
Compute gradient matching loss: - 10:
Compute the adaptive aggregating weights: - 11:
Federated aggregation: - 12:
end for - 13:
Federated optimization: - 14:
for do - 15:
Compute gradient matching loss: - 16:
Compute model contrastive loss: - 17:
Compute classification loss: - 18:
Compute federated optimization loss: - 19:
Update personalized federated parameters: - 20:
end for - 21:
Update multimodal classifier: - 22:
end for - 23:
Return the pFL model for each federated site.
|
Then, we introduce the MCB strategy to fuse mulitmodal features encoded from sMRI and fMRI data, making full use of structural and functional information. It aims to utilize the advantages of different types of MRI data and improve the performance of MDD classification in each MRI dataset.
We next provide the details of our proposed pFL algorithm. Specifically, there are N medical sites in the federation. The private dataset in federated site can be denoted as . denotes the brain MRI data, containing the sMRI data , ALFF data , and ReHo data . represents the diagnostic label of each subject, where 0 indicates normal controls and 1 indicates MDD patients, respectively. Our objective is to train a pFL model for site to improve the performance of MDD classification in without transferring raw data from other federated sites . Significantly, we only introduce the details of the pFL models’ federated training process using sMRI data here; the pFL models using ALFF and ReHo data share the same training process. To simplify the expressions, we use here to represent the pFL models for sMRI data, containing two main modules: (1) encoder and classifier and (2) gradient matching module .
A federated encryption mechanism is essential for sharing gradient information across sites [
31], and differential privacy encryption [
32] has a relatively low computational load and communication consumption. Specifically, it provides a privacy-preserving bound to ensure that private or sensitive information is not exposed in the shared information. Inspired by a previous study, we found that adding noise, such as Gaussian noise, to federated information is an effective way to realize differential privacy and limit the granularity of the information. Thus, this provides a good privacy guarantee for each federated site. Specifically, we construct the Gaussian noise generator
for each site, which can be defined as
where
P represents the federated information shared from the federated site
, such as gradients or model parameters.
represents the Gaussian noise for privacy preservation (mean value is 0 and variance value is
). Hereby, differential privacy of a specific privacy-preserving bound can be guaranteed across sites by linking the parameter
in this random encrypted mechanism. A larger
will blur the shared gradients more and can even be harmful for model training, so the balance between the degree of privacy preservation and model performance can be adjusted via the parameter
.
2.3. Federated Adaptive Aggregation Based on Gradient Matching
Similarly to the implementation of the federated training algorithm [
22], we split the updating process into local training and federated optimization. Specifically, we set
as the initial model for site
. In iteration
t, the model
is updated using local dataset
by calculating the loss in classifying diagnosis labels. We use the cross-entropy loss function
and the classification loss
, which can be formulated as
Then, the parameters
of model
would be updated locally,
where
is the learning rate. Then, model
would be updated to
and be ready for federated information aggregation after learning local knowledge.
We extract gradient information from the local training process of each site for federated adaptive aggregation [
33]. Specifically, the gradient matching module
in site
is encrypted and shared with the other sites
to calculate and aggregate the encrypted gradient
. Previous studies have shown that different features may have greater homogeneity when their gradients have a similar distribution. The core idea of the gradient matching mechanism is to maximize the similarity between gradients calculated by inputting different features with the same model parameters. As for site
, we use the encoding features of MRI data
as the input of the gradient matching model
, which can be formulated as follows:
Considering that the gradient matching loss can be seen as the similarity of the data distribution among the other sites
and local site
, we define the adaptive aggregating weights
for federated information
in iteration
t, which can be formulated as follows:
Then, the encrypted local parameters from the federation can be weighted adaptively according to the gradient matching loss. The model would be updated by aggregating federated information,
where
is the regularization term for the parameters in federated aggregation. Thus, the specified aggregated weights can utilize more information from federations with a similar data distribution, thus alleviating the effects of domain shifts.
2.4. Federated Model Contrastive Optimization
The purpose of our federated algorithm is to train a federated personalized model for each site, which can enable better performance during MDD classification in local private datasets. Thus, the model should be updated towards site-specific optimization. Although federated adaptive aggregation can utilize global information from cross-site MRI datasets, the aggregated model may deviate from the local optimum and need to be corrected for site-specific personalization. Thus, we further propose personalized optimization based on the combination of the model contrastive loss and gradient matching loss.
We construct the model contrastive loss
to measure the similarity in the features encoded by different models [
34]. Specifically, we extract the encoding features of different models using the same MRI data
, including the federated features
, local features
, and previous features
encoded by the federated model updated in the last iteration
. The contrastive loss is based on calculating the cosine similarity
between different encoding features. It can be formulated as
where
denotes the relevant parameters. Then, the contrastive loss would be minimized to ensure personalized optimization while aggregating federated information from the other sites.
We also maintain the gradient matching loss to further constrain the updating direction of the personalized federated model, which can prevent the pFL model from forgetting the global information. According to Equation (
4), the gradient matching loss in federated optimization can be formulated as
Then, the classification loss
for classifying the diagnosis labels can be formulated as
Finally, we build the federated optimized objective function and the personalized federated model
with parameters
, which can be formulated as
where
and
denote the learnable parameters to scale and shift the federated loss value.
2.5. Fusion Strategy for Multimodal MRI Data
To extract and utilize more information to improve the performance of MDD diagnostic classification, we introduce a fusion strategy for multimodal MRI data. Considering the different feature spaces between 3D sMRI data and 4D rs-fMRI data, we calculated the ALFF and ReHo metrics based on the 4D rs-fMRI data to convert the functional features into 3D space. Inspired by previous studies, we use the MCB pooling method to combine and fuse multimodal features encoded from different MRI data. Notably, standard bilinear pooling computes the outer product of two feature vectors, resulting in a high-dimensional representation that captures all multiplicative interactions. To avoid the high computational cost, MCB uses count sketch based on a linear projection encoding layer and fast Fourier transform (FFT) for convolution. Thus, the MCB for different modality features can be formulated as
where
F denotes the FFT operation and
denotes the inverse FFT, while ⊙ is element-wise multiplication. Then, the vector
is a compact representation capturing the multiplicative interactions between features
a and
b, which can fuse the different modality features effectively.
The fusion strategy can be seen in
Figure 1c. Firstly, we use the single-modal MRI data to train a pFL model for each federated site. Then, we fuse the features encoded from the ALFF data and ReHo data for complete rs-fMRI representations using MCB pooling. After this, the rs-fMRI representations would be fused with features encoded from sMRI data to combine the structural and functional features. Thus, the fused multimodal MRI representation is
. Then, we further construct a multilayer perceptron (MLP) network
as a multimodal classifier for MDD diagnostic classification in site
. The parameters
are updated while freezing the parameters of single-modality models
, which can be optimized as follows:
Thus, our method can not only train pFL models for single-modal MRI data but also can utilize and combine structural and functional features encoded from multimodal MRI data to further improve the performance of MDD diagnosis.
3. Results
3.1. Implementation Details
The models are trained on a Ubuntu 18.04.1 server with two eight-core Intel E5 2609 1.7GHz processors and four NVIDIA GTX-V100 graphical processing units. The codes are written in the Python 3.7.1 and Pytorch 1.13.1 frameworks [
35]. Each model in our framework is trained for 80 epochs in total, and the batch size is set to 8. We use the AdamW optimizer [
36] with weight decay of 0.95. The initial learning rate
is set to 0.01, and the federated learning rate
is set to 0.001. The federated aggregation process was performed once at the end of each epoch. The variance
of Gaussian noise is set to 0.001, according to the results of the control analysis presented in the next section. To avoid the additional influence caused by the network architecture, we use 3D-ResNet-10 as the backbone for model training to compare the performance of different learning algorithms.
Then, the number of the encoding features from the sMRI data, ALFF, and ReHo is 1024 based on 3D-ResNet-10. Notably, the MCB pooling strategy used for the fusion of multimodal features would not change the number of latent features. Thus, the fused multimodal feature number remains 1024. To simplify the structure of the MLP module, we use three-layer fully connected layers. Specifically, the size of the first layer corresponds to the dimensions of the encoded features from the fusion of MRI data. The second layer’s size is set to 128. As all tasks involve binary classification for MDD diagnosis, the size of the output layer is 2.
The experiments were carried out using a five-fold cross-validation strategy. Four folds were used for model training and the remaining one for testing. We used the average accuracy (ACC) and area under the curve (AUC) during cross-validation to evaluate the classification performance.
3.2. Comparison with SOTA Federated Learning Methods
We first train a local MDD identification model using single-modal MRI data for each client as the baseline comparison. We use 3D-ResNet-10 as the backbone for subsequent experiments. To evaluate the performance of our federated learning algorithm, we compare it against several widely used FL methods: centralized learning (aggregating all site data into a unified training set), FedAvg [
22] (aggregating model parameters from each site to optimize a unique global model), FedProx [
37] (an improved version of FedAvg with a regularization term), and FedALA [
38] (a personalized FL method with adaptive local aggregation) and our proposed pF-GMCO.
The performance of the diagnostic models trained by different learning algorithms can be found in
Table 2 and
Figure 2. The local models achieved average accuracy of 74.55 ± 0.31%, 58.23 ± 0.39%, and 60.10 ± 0.42%, based on using sMRI data, ALFF data, and ReHo data, respectively. Compared to these local baselines, the centralized learning approach does not yield significant improvements, likely due to the adverse effects of domain shifts when aggregating multisite data directly. Furthermore, models trained using FedAvg perform even more poorly than the local models, suggesting that the naive averaging of heterogeneous model parameters can impair federated model performance.
In contrast, federated methods, which are designed to mitigate domain shift, such as FedProx with the addition of a regularized constraint and FedALA with adaptive aggregation, show improved performance. Among all compared methods, our pF-GMCO algorithm achieves state-of-the-art results in MDD classification across multisite MRI data. The ACCs of our method across all federated sites reach up to 77.19 ± 0.25%, 61.43 ± 0.29%, and 64.17 ± 0.35% when using sRMI data, ALFF data, and ReHo data, respectively, demonstrating its effectiveness in leveraging distributed data while preserving privacy and adapting to personalized characteristics.
3.3. Multimodal MRI Data Fusion
Then, we fuse the multimodal MRI data based on the MCB pooling method for MDD diagnostic classification. The results can be seen in
Table 2 and
Figure 2. It can be observed that the models trained by fusing multimodal MRI data significantly improved the performance compared to models trained by single-modality MRI data. Specifically, the averaged ACCs of the pFL models trained by our proposed method reached up to 79.07%, meeting the 80% clinically relevant accuracy threshold.
3.4. Control Analysis
To comprehensively evaluate the effectiveness of the proposed pF-GMCO algorithm under various conditions, we conducted a series of controlled experiments focusing on three critical variables: adaptive aggregation weighting, the privacy–utility trade-off, and robustness against dishonest federated participants.
The adaptive aggregation strategy in pF-GMCO is designed to utilize domain-invariant information by quantifying the similarity of the data distribution across different clients. As illustrated in
Figure 3, the assigned aggregation weights of each client are strongly correlated with the similarity. Specifically, pF-GMCO effectively prioritizes information from clients with higher similarity, enhancing the relevance and quality of federated updating. This demonstrates the capacity to integrate multiclient available knowledge dynamically and meaningfully.
Then, we test different values of variance
in the Gaussian noise for encryption. A larger value of
means a higher degree of privacy preservation, but the optimization directions would be more unclear, which may reduce the performance of pFL models. We test the variance with four values, i.e., 0, 0.1, 0.01, and 0.001, and the performance can be seen in
Figure 4a. The results indicate that the degree of encryption yields a trade-off between the performance of diagnostic classification and data privacy preservation. We choose
in our pFL algorithm.
Considering the dishonest participant sites in the federation, we further test the influence of using fake federated information in our pFL algorithm. We simulate scenarios in which all federated sites share the fake information for the federated training process, and the results can be seen in
Figure 4b. Specifically, we generate random fake information in each site and share it with the other sites for federated updating. However, it can be observed that our pF-GMCO algorithm consistently sustained high performance across all federated clients, even when there were dishonest participant clients with fake information. This illustrates that our algorithm has the capacity to mitigate the impact of invalid federated information and maintain reliable performance even in non-ideal federation environments.
3.5. Ablation Study
To verify the effectiveness of our pF-GMCO algorithm, we conduct ablation studies using multiclient sMRI data for MDD diagnostic classification, and the results are shown in
Table 3. It is shown that federated adaptive aggregation can utilize domain-invariant information from multiple sites based on the similarity of the data distributions, which can improve the performance of FL models. The FL models trained by the adaptive aggregation method outperform the FL models trained by FedAvg, which proves that considering domain shift in the FL training processes can improve the performance of models significantly. Moreover, federated optimization based on the constraints of gradient matching and model contrastive loss can further optimize FL models towards local personalization. When all personalized federated strategies are applied, the diagnostic classification abilities of the pFL models can be significantly enhanced.
3.6. Most Discriminative ROIs and Visualization Analysis
To identify the most discriminative regions of interest (ROIs) for the diagnostic classification of major depressive disorder (MDD), we construct occlusion sensitivity maps for different types of MRI data [
39]. The maps are obtained by replacing volumetric patches within the 3D MRI feature maps with zero-value voxels of an equivalent size and evaluating the corresponding decrease in the prediction accuracy. A greater reduction in accuracy upon the occlusion of a specific ROI indicates higher discriminative power. Note that the ROI heat map is generated only based on the data-driven method, and we do not consider age, sex, the first episode or recurrence, and other clinical factors.
The average occlusion sensitivity maps for each MRI modality are presented in
Figure 5.
Our results show that gray matter volumes (GMVs) are significantly reduced in several brain regions, including the temporal cortex, angular cortex, hippocampus, posterior cingulate cortex, orbitofrontal cortex, superior frontal cortex, superior parietal cortex, supplementary motor area (SMA), and certain cerebellar subregions. These alterations are consistent with previously reported disruptions in the cortico-limbic-cerebellar circuit in MDD patients [
40,
41]. Additionally, we observed significant decreases in the amplitudes of low-frequency fluctuations (ALFF) and regional homogeneity (ReHo) within these regions.
Next, we use the two-sample
t-test method to locate the most discriminative brain regions for MDD diagnostic classification. We extract the gray matter volume, ALFF, and ReHo values of the most discriminative ROIs from all subjects in multiple sites and construct averaged maps. The
t-test results reveal significant GMV differences between MDD patients and NCs (
p < 0.05, FDR-corrected), including in the bilateral anterior and posterior cerebellar cortices, left middle temporal cortex, right superior temporal cortex, bilateral inferior orbitofrontal cortex, superior and inferior frontal cortices, left hippocampus, left lingual cortex, left insula, bilateral precentral cortex, right angular gyrus, and left superior motor area. Furthermore, the ReHo values are significantly decreased in the left hippocampus, left amygdala, bilateral superior frontal cortex, left middle cingulate cortex, SMA, and inferior parietal cortex. ALFF reductions are prominent in the left middle and superior temporal cortices, left posterior cingulate cortex, and precuneus (all
p < 0.05, FDR-corrected; see
Table 3 and
Figure 6).
These findings indicate that the GMVs are notably reduced in the superior temporal cortex, inferior temporal cortex, posterior cingulate cortex, orbital frontal cortex, superior frontal cortex, superior parietal cortex, supplementary motor area, and bilateral cerebellum posterior lobe, which might affect the cortico-limbic-cerebellar circuit in MDD patients [
40,
41,
42]. We also find a significant decrease in the ALFF and ReHo values in these brain regions.
According to the visualization results, the most discriminative ROIs are primarily located in the frontoparietal network (FPN) and default mode network (DMN). Reductions in GMV and functional activity within these networks may impair their efficiency, thereby affecting perceptual switching and cognitive function in MDD patients. Furthermore, the supplementary motor area is considered the necessary cortical area for voluntary movement and also participates in cognitive activity [
43]. Alterations of the SMA may be implicated in psychomotor retardation, which is a key feature in MDD patients [
44]. Additionally, changes were observed in other regions critical for high-order cognitive and emotional functioning, such as decreased GMVs in the left occipital lobe and left parahippocampal gyrus, as well as reduced GMVs and ALFF in the precuneus.
The visualization analysis has verified the physiological interpretability of our pFL models, confirming that the proposed method not only achieves high diagnostic performance but also identifies neurobiologically meaningful biomarkers aligned with the established MDD pathophysiology.
4. Discussion
This study proposes the pF-GMCO algorithm for MDD diagnostic classification, offering a personalized federated learning framework that facilitates privacy-preserving, multisite collaboration based on adaptive gradient aggregation and contrastive optimization. The adaptive gradient aggregation mechanism in pF-GMCO incorporates domain-invariant features from multiple sites by evaluating data distribution similarities, offering a robust strategy for integrating global knowledge while mitigating domain shifts. Furthermore, federated gradient matching and model contrastive loss enable the models to be optimized towards site-specific personalization, without sacrificing global representational coherence. This unique algorithmic design ensures that each participant benefits from the collective training process while preserving localized optimization trajectories. Then, multimodal MRI data are integrated via compact bilinear pooling, which captures complex cross-modal interactions and enhances the discriminative power of the feature representation. Our approach achieves state-of-the-art performance in MDD classification, with accuracy of 79.07% across 23 sites and 2293 subjects. Notably, the identified discriminative biomarkers align with the established MDD pathophysiology, providing not only high classification accuracy but also neurobiologically meaningful interpretation.
Although the proposed method achieves promising performance in MDD diagnostic classification, there are several limitations to this study. Firstly, to ensure a fair and efficient comparison across different federated learning algorithms, we employed the 3D-ResNet-10 architecture as the backbone for all models. Although this backbone architecture has few trainable parameters and is easy to optimize, its relatively simple structure may limit its capacity to capture and encode complex neuroimaging features. We could explore more specialized backbone architectures to capture and extract features.
Secondly, to mitigate the effects of the extremely limited scale of samples in many real-world medical sites, several sites are collected into larger clients. While aggregating clients may improve the local training stability, it may introduce intraclient heterogeneity and impair the local optimization for personalized data distribution. It is necessary to pay more attention to reducing the heterogeneity within aggregated clients.
Thirdly, although the ALFF and ReHo features contributed to improving MDD classification, models trained solely on these fMRI-derived measures still underperform and remain below clinical application thresholds. ALFF and ReHo data may result in the loss of fine-grained temporal information about dynamic functional activity in 4D rs-fMRI data, which could constrain the discriminative power of the resulting features. Moreover, the rs-fMRI data used in this study were acquired at non-standardized times across sites. Since diurnal rhythms affect neuroimaging metrics [
45], this temporal heterogeneity may add variability to ALFF and ReHo features. Nevertheless, despite these inherent limitations, ALFF and ReHo still provide valuable functional perspectives that are complementary to structural information. Future work should also record scan times and participant chronotypes and develop temporal harmonization methods for federated learning. Furthermore, as the pharmacological therapy status is a common and significant confounder in MRI-based MDD diagnosis, future studies must prioritize the detailed phenotyping of patients’ medication histories and dosages. This will enable the training of diagnostic models on more finely stratified subgroups, which is essential in disentangling the neural correlates of MDD from the effects of its treatment.
Finally, we only tested our proposed pF-GMCO method on an MDD diagnostic classification task in our study, but it also has potential in the diagnostic classification of multiple neuropsychiatric disorders, which needs to be explored in the future.
In summary, pF-GMCO provides a powerful tool for privacy-preserving, multisite multimodal MRI data analysis through the innovative combination of adaptive federation, personalized optimization, and multimodal integration. The method not only achieves superior diagnostic classification performance but also finds interpretable potential biomarkers for MDD. Our future work would benefit from collecting more independent medical sites and constructing federated learning applications in cross-site clinical scenarios, with significant potential to extend to multiple neuropsychiatric disorders’ diagnosis.