Benchmarking Multimodal Deep Fusion Strategies for Heterogeneous Neuroimaging and Cognitive Data Using a Controlled Sex Classification Task
Highlights
- Early feature-level fusion consistently outperforms intermediate and late fusion strategies in multimodal sex classification.
- Standard feature scaling significantly enhances multimodal deep learning performance across architectures.
- Architectural complexity does not guarantee superior performance in heterogeneous multimodal integration.
- Fusion strategy and preprocessing must be jointly optimized for reliable and reproducible multimodal modeling in neuroscience.
Abstract
1. Introduction
- (i)
- We provide a systematic and controlled benchmark of multimodal deep learning fusion strategies, comparing early, intermediate, and late fusion approaches within a unified experimental framework;
- (ii)
- We explicitly investigate the impact of feature scaling on multimodal model performance, highlighting its critical role in heterogeneous data integration;
- (iii)
- We implement and evaluate all models within a consistent and reproducible pipeline, enabling fair comparison across architectures;
- (iv)
- We provide practical insights into the design of multimodal learning systems for neuroscience applications.
2. Materials and Methods
2.1. Data Preparation
2.2. Data Fusion Workflow
- Operation-based fusion (e.g., feature concatenation);
- Attention-based fusion (e.g., multi-head attention mechanisms);
- Subspace-based fusion (e.g., variational autoencoders);
- Graph-based fusion (e.g., graph neural networks).
2.2.1. Data Scaling
2.2.2. Deep Learning Models for Unimodal and Multimodal Fusion
Unimodal Deep Learning Models
Early Fusion Models (Feature-Level Fusion)
- Concat Tabular Data.
- Concat Tabular Feature Maps.
- Channel-Wise Multi Net.
Intermediate Fusion Models (Representation-Level Fusion)
- Tabular Crossmodal Multihead Attention.
- Activation Function and Tabular Self-Attention.
- MCVAE Tabular.
- Edge Correlation GNN.
- Attention-Weighted GNN.
Late Fusion Models (Decision-Level Fusion)
- Tabular Decision.
2.2.3. Hyperparameter Tuning and Performance Evaluation
2.3. Pseudo-Code of the Proposed Framework
| Algorithm 1. Multimodal Fusion Pipeline |
| Input: Cognitive features Xc, Imaging features Xi, Labels y 1. Preprocess data - Clean dataset - Split into folds (cross-validation) 2. Apply feature scaling For each scaling method in {Standard, Min–Max, Robust}: - Xc_scaled ← scale(Xc) - Xi_scaled ← scale(Xi) 3. Unimodal modeling - Train model Mc on Xc_scaled - Train model Mi on Xi_scaled 4. Multimodal fusion For each fusion strategy: (a) Early fusion: X_fused ← concatenate(Xc_scaled, Xi_scaled) Train model M_fused on X_fused (b) Intermediate fusion: hc ← encoder_c(Xc_scaled) hi ← encoder_i(Xi_scaled) h_fused ← fusion_module(hc, hi) Train classifier on h_fused (c) Late fusion: yc ← Mc(Xc_scaled) yi ← Mi(Xi_scaled) y_fused ← combine(yc, yi) 5. Model training - Optimize parameters - Apply early stopping when available 6. Evaluation - Compute AUC-ROC, Accuracy, Balanced Accuracy, F1-score, Cohen’s kappa, Average Precision - Average results across folds Output: Performance metrics for each fusion strategy |
3. Results
3.1. Unimodal Deep Learning Models
3.2. Early Fusion Models
3.3. Intermediate Fusion Models
3.4. Late Fusion Models
3.5. Effect of Feature Scaling Across Fusion Strategies
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| GNN | Graph Neural Network |
| HCP | Human Connectome Project |
| PSQI | Pittsburgh Sleep Quality Index |
| ER40 | ER40—Emotion Recognition Task (Penn Emotion Recognition Test) |
| ASR | Adult Self-Report |
| MMSE | Mini-Mental State Examination |
| IQR | InterQuartile Range |
| AUC-ROC | Area Under the Receiver Operating Characteristic Curve |
| PR | Precision-Recall |
| MCVAE | Multi-Channel Variational Autoencoder |
| MADDi | Multimodal Attention Deep Learning Framework |
References
- Donini, M.; Monteiro, J.M.; Pontil, M.; Hahn, T.; Fallgatter, A.J.; Shawe-Taylor, J.; Mourão-Miranda, J. Combining Heterogeneous Data Sources for Neuroimaging Based Diagnosis: Re-Weighting and Selecting What Is Important. Neuroimage 2019, 195, 215–231. [Google Scholar] [CrossRef]
- Lahat, D.; Adali, T.; Jutten, C. Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects. Proc. IEEE 2015, 103, 1449–1477. [Google Scholar] [CrossRef]
- Rajendran, S.; Pan, W.; Sabuncu, M.R.; Chen, Y.; Zhou, J.; Wang, F. Learning across Diverse Biomedical Data Modalities and Cohorts: Challenges and Opportunities for Innovation. Patterns 2024, 5, 100913. [Google Scholar] [CrossRef]
- Mwangi, B.; Tian, T.S.; Soares, J.C. A Review of Feature Reduction Techniques in Neuroimaging. Neuroinformatics 2014, 12, 229–244. [Google Scholar] [CrossRef] [PubMed]
- Ferr, H. The Normal Distribution Is Not Normal in Psychological Data: Moving beyond Parametric Dogma. PLoS Ment. Health 2025, 2, e0000403. [Google Scholar] [CrossRef] [PubMed]
- Baltrušaitis, T.; Ahuja, C.; Morency, L.-P. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 423–443. [Google Scholar] [CrossRef]
- Pawłowski, M.; Wróblewska, A.; Sysko-Romańczuk, S. Effective Techniques for Multimodal Data Fusion: A Comparative Analysis. Sensors 2023, 23, 2381. [Google Scholar] [CrossRef]
- Stahlschmidt, S.R.; Ulfenborg, B.; Synnergren, J. Multimodal Deep Learning for Biomedical Data Fusion: A Review. Brief. Bioinform. 2022, 23, bbab569. [Google Scholar] [CrossRef]
- Huang, S.-C.; Pareek, A.; Seyyedi, S.; Banerjee, I.; Lungren, M.P. Fusion of Medical Imaging and Electronic Health Records Using Deep Learning: A Systematic Review and Implementation Guidelines. npj Digit. Med. 2020, 3, 136. [Google Scholar] [CrossRef]
- Zhao, F.; Zhang, C.; Geng, B. Deep Multimodal Data Fusion. ACM Comput. Surv. 2024, 56, 1–36. [Google Scholar] [CrossRef]
- Ramachandram, D.; Taylor, G.W. Deep Multimodal Learning: A Survey on Recent Advances and Trends. IEEE Signal Process. Mag. 2017, 34, 96–108. [Google Scholar] [CrossRef]
- Li, Y.; El Habib Daho, M.; Conze, P.-H.; Zeghlache, R.; Le Boité, H.; Tadayoni, R.; Cochener, B.; Lamard, M.; Quellec, G. A Review of Deep Learning-Based Information Fusion Techniques for Multimodal Medical Image Classification. Comput. Biol. Med. 2024, 177, 108635. [Google Scholar] [CrossRef]
- Yang, H.; Yang, M.; Chen, J.; Yao, G.; Zou, Q.; Jia, L. Multimodal Deep Learning Approaches for Precision Oncology: A Comprehensive Review. Brief. Bioinform. 2025, 26, bbae699. [Google Scholar] [CrossRef] [PubMed]
- Waqas, A.; Tripathi, A.; Ramachandran, R.P.; Stewart, P.A.; Rasool, G. Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review. Front. Artif. Intell. 2024, 7, 1408843. [Google Scholar] [CrossRef] [PubMed]
- Wang, C.; Zhang, W.; Ni, M.; Wang, Q.; Liu, C.; Dai, L.; Zhang, M.; Shen, Y.; Gao, F. Deep-Learning Based Multi-Modal Models for Brain Age, Cognition and Amyloid Pathology Prediction. Alzheimer’s Res. Ther. 2025, 17, 126. [Google Scholar] [CrossRef] [PubMed]
- Choi, B.K.; Choi, Y.; Jang, S.; Ha, W.-S.; Cho, S.; Chang, K.; Sohn, B.; Kim, K.M.; Park, Y.R. Multimodal Deep Learning Model for Prediction of Prognosis in Central Nervous System Inflammation. Brain Commun. 2025, 7, fcaf179. [Google Scholar] [CrossRef]
- Upadhyay, D.; Joshi, H. Mathematical Modeling of Local Calcium Signaling in Neurons Using Artificial Neural Networks. Discret. Contin. Dyn. Syst.-S 2025, 18, 1392–1415. [Google Scholar] [CrossRef]
- Multi-Modal Deep Learning Framework for Early Detection of Parkinson’s Disease Using Neurological and Physiological Data for High-Fidelity Diagnosis Scientific Reports. Available online: https://www.nature.com/articles/s41598-025-21407-6 (accessed on 7 February 2026).
- RACF: A Multimodal Deep Learning Framework for Parkinson’s Disease Diagnosis Using SNP and MRI Data. Available online: https://www.mdpi.com/2076-3417/15/8/4513 (accessed on 7 February 2026).
- Chang, B.; Geng, Z.; Mei, J.; Wang, Z.; Chen, P.; Jiang, Y.; Niu, C. Application of Multimodal Deep Learning and Multi-Instance Learning Fusion Techniques in Predicting STN-DBS Outcomes for Parkinson’s Disease Patients. Neurotherapeutics 2024, 21, e00471. [Google Scholar] [CrossRef]
- Golovanevsky, M.; Eickhoff, C.; Singh, R. Multimodal Attention-Based Deep Learning for Alzheimer’s Disease Diagnosis. J. Am. Med. Inf. Assoc. 2022, 29, 2014–2022. [Google Scholar] [CrossRef]
- Suk, H.-I.; Lee, S.-W.; Shen, D. Hierarchical Feature Representation and Multimodal Fusion with Deep Learning for AD/MCI Diagnosis. Neuroimage 2014, 101, 569–582. [Google Scholar] [CrossRef]
- Liu, S.; Liu, S.; Cai, W.; Che, H.; Pujol, S.; Kikinis, R.; Feng, D.; Fulham, M.J. Multi-Modal Neuroimaging Feature Learning for Multi-Class Diagnosis of Alzheimer’s Disease. IEEE Trans. Biomed. Eng. 2015, 62, 1132–1140. [Google Scholar] [CrossRef] [PubMed]
- Ngiam, J.; Khosla, A.; Kim, M.; Nam, J.; Lee, H.; Ng, A. Multimodal Deep Learning; Springer: Berlin/Heidelberg, Germany, 2011; p. 696. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. Available online: https://arxiv.org/abs/1609.02907v4 (accessed on 28 March 2026).
- Calhoun, V.D.; Sui, J. Multimodal Fusion of Brain Imaging Data: A Key to Finding the Missing Link(s) in Complex Mental Illness. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 2016, 1, 230–244. [Google Scholar] [CrossRef] [PubMed]
- Kiela, D.; Grave, E.; Joulin, A.; Mikolov, T. Efficient Large-Scale Multi-Modal Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Duerden, E.G.; Chakravarty, M.M.; Lerch, J.P.; Taylor, M.J. Sex-Based Differences in Cortical and Subcortical Development in 436 Individuals Aged 4–54 Years. Cereb. Cortex 2020, 30, 2854–2866. [Google Scholar] [CrossRef] [PubMed]
- Camastra, C.; Sarica, A. Brain Morphometry Differences Across Sexes Revealed Through Explainable Artificial Intelligence: A Human Connectome Project Young Adult Study. In Machine Learning, Optimization, and Data Science; Springer: Berlin/Heidelberg, Germany, 2025; pp. 246–260. ISBN 978-3-031-82486-9. [Google Scholar]
- Ritchie, S.J.; Cox, S.R.; Shen, X.; Lombardo, M.V.; Reus, L.M.; Alloza, C.; Harris, M.A.; Alderson, H.L.; Hunter, S.; Neilson, E.; et al. Sex Differences in the Adult Human Brain: Evidence from 5216 UK Biobank Participants. Cereb. Cortex 2018, 28, 2959–2975. [Google Scholar] [CrossRef]
- Gur, R.C.; Turetsky, B.I.; Matsui, M.; Yan, M.; Bilker, W.; Hughett, P.; Gur, R.E. Sex Differences in Brain Gray and White Matter in Healthy Young Adults: Correlations with Cognitive Performance. J. Neurosci. 1999, 19, 4065–4072. [Google Scholar] [CrossRef]
- Nichols, E.S.; Wild, C.J.; Owen, A.M.; Soddu, A. Cognition across the Lifespan: Investigating Age, Sex, and Other Sociodemographic Influences. Behav. Sci. 2021, 11, 51. [Google Scholar] [CrossRef]
- Li, R. Why Women See Differently from the Way Men See? A Review of Sex Differences in Cognition and Sports. J. Sport Health Sci. 2014, 3, 155–162. [Google Scholar] [CrossRef]
- Levine, S.C.; Foley, A.; Lourenco, S.; Ehrlich, S.; Ratliff, K. Sex Differences in Spatial Cognition: Advancing the Conversation. WIREs Cogn. Sci. 2016, 7, 127–155. [Google Scholar] [CrossRef]
- Giofrè, D.; Toffalini, E.; Esposito, L.; Cornoldi, C. Sex/Gender Differences in General Cognitive Abilities: An Investigation Using the Leiter-3. Cogn. Process. 2024, 25, 663–672. [Google Scholar] [CrossRef]
- Van Essen, D.C.; Ugurbil, K.; Auerbach, E.; Barch, D.; Behrens, T.E.J.; Bucholz, R.; Chang, A.; Chen, L.; Corbetta, M.; Curtiss, S.W.; et al. The Human Connectome Project: A Data Acquisition Perspective. Neuroimage 2012, 62, 2222–2231. [Google Scholar] [CrossRef]
- Berthold, M.R.; Cebron, N.; Dill, F.; Fatta, G.D.; Gabriel, T.R.; Georg, F.; Meinl, T.; Ohl, P.; Sieb, C.; Wiswedel, B. Knime: The Konstanz Information Miner. ACM SIGKDD Explor. Newsl. 2009, 11, 26–31. [Google Scholar] [CrossRef]
- Townend, F.; Roddy, P.J.; Goebl, P. Fusilli v1.2.3; GitHub Repository. Available online: https://github.com/florencejt/fusilli (accessed on 7 April 2026).
- Data Mining: Concepts and Techniques. Available online: https://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf (accessed on 28 March 2026).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems 30 (NIPS 2017); NeurIPS: La Jolla, CA, USA, 2017. [Google Scholar]
- Tsai, Y.-H.H.; Bai, S.; Liang, P.P.; Kolter, J.Z.; Morency, L.-P.; Salakhutdinov, R. Multimodal Transformer for Unaligned Multimodal Language Sequences. Available online: https://arxiv.org/abs/1906.00295v1 (accessed on 28 March 2026).
- Parisot, S.; Ktena, S.I.; Ferrante, E.; Lee, M.; Guerrero, R.; Glocker, B.; Rueckert, D. Disease Prediction Using Graph Convolutional Networks: Application to Autism Spectrum Disorder and Alzheimer’s Disease. Med. Image Anal. 2018, 48, 117–130. [Google Scholar] [CrossRef] [PubMed]
- Pereira, L.M.; Salazar, A.; Vergara, L. A Comparative Analysis of Early and Late Fusion for the Multimodal Two-Class Problem. IEEE Access 2023, 11, 84283–84300. [Google Scholar] [CrossRef]
- Antelmi, L.; Ayache, N.; Robert, P.; Lorenzi, M. Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 24 May 2019; pp. 302–311. [Google Scholar]
- Liu, J.; Capurro, D.; Nguyen, A.; Verspoor, K. Attention-Based Multimodal Fusion with Contrast for Robust Clinical Prediction in the Face of Missing Modalities. J. Biomed. Inf. 2023, 145, 104466. [Google Scholar] [CrossRef] [PubMed]
- Jiao, T.; Guo, C.; Feng, X.; Chen, Y.; Song, J. A Comprehensive Survey on Deep Learning Multi-Modal Fusion: Methods, Technologies and Applications. Comput. Mater. Contin. 2024, 80, 1–35. [Google Scholar] [CrossRef]
- Li, S.; Tang, H. Multimodal Alignment and Fusion: A Survey. arXiv 2024, arXiv:2411.17040. [Google Scholar] [CrossRef]
- Erukude, S.T.; Veluru, S.R.; Marella, V.C. Multimodal Deep Learning: A Survey of Models, Fusion Strategies, Applications, and Research Challenges. Int. J. Comput. Appl. 2025, 187, 1–7. [Google Scholar] [CrossRef]
- Liu, J.; Cen, X.; Yi, C.; Wang, F.; Ding, J.; Cheng, J.; Wu, Q.; Gai, B.; Zhou, Y.; He, R.; et al. Challenges in AI-Driven Biomedical Multimodal Data Fusion and Analysis. Genom. Proteom. Bioinform. 2025, 23, qzaf011. [Google Scholar] [CrossRef]
- Zhang, T.; Shi, M. Multi-Modal Neuroimaging Feature Fusion for Diagnosis of Alzheimer’s Disease. J. Neurosci. Methods 2020, 341, 108795. [Google Scholar] [CrossRef]
- Zhang, R.; Sheng, J.; Zhang, Q.; Wang, J.; Wang, B. A Review of Multimodal Fusion–Based Deep Learning for Alzheimer’s Disease. Neuroscience 2025, 576, 80–95. [Google Scholar] [CrossRef] [PubMed]




| Hyperparameter | Attention and Activation | Attention Weighted GNN | Edge Correlation GNN |
|---|---|---|---|
| Attention Reduction Ratio | 32 | — | — |
| Drop Out Probability | — | 0.1 | 0.1 |
| Threshold | — | — | 0.75 |
| Patience | — | 250 | — |
| Data | Male (n = 337) | Female (n = 410) |
|---|---|---|
| Age (in years) | 28.2 ± 3.5 | 29.9 ± 3.4 |
| Education level (in years) | 15.1 ± 1.7 | 15.3 ± 1.7 |
| MMSE 1 | 29.1 ± 1.0 | 29.1 ± 0.9 |
| Data Fusion Model | Standard Scaling | Min–Max Scaling | Robust Scaling |
|---|---|---|---|
| Performance | AUC-ROC (CI) | AUC-ROC (CI) | AUC-ROC (CI) |
| ACC (95% CI) | ACC (95% CI) | ACC (95% CI) | |
| Tab1-Uni | 0.77 (0.75–0.81) | 0.79 (0.77–0.81) | 0.78 (0.76–0.81) |
| 0.70 (0.68–0.71) | 0.70 (0.69–0.71) | 0.72 (0.70–0.75) | |
| Tab2-Uni | 0.91 (0.89–0.93) | 0.93 (0.93–0.94) | 0.91 (0.90–0.93) |
| 0.84 (0.82–0.86) | 0.83 (0.78–0.88) | 0.84 (0.82–0.86) | |
| Concat-TabData | 0.96 (0.95–0.96) | 0.94 (0.92–0.98) | 0.95 (0.94–0.95) |
| 0.88 (0.86–0.91) | 0.85 (0.82–0.90) | 0.88 (0.88–0.88) | |
| Concat-TabFeat | 0.90 (0.89–0.92) | 0.93 (0.91–0.95) | 0.92 (0.90–0.93) |
| 0.84 (0.83–0.86) | 0.86 (0.83–0.89) | 0.85 (0.84–0.86) | |
| Channel-MultiNet | 0.86 (0.86–0.89) | 0.60 (0.50–0.80) | 0.86 (0.85–0.88) |
| 0.87 (0.85–0.89) | 0.58 (0.45–0.79) | 0.85 (0.84–0.86) | |
| Tab-CrossMHA | 0.92 (0.89–0.92) | 0.94 (0.93–0.96) | 0.92 (0.91–0.94) |
| 0.84 (0.82–0.88) | 0.87 (0.86–0.88) | 0.86 (0.85–0.88) | |
| AF-TabSelfAtt | 0.79 (0.77–0.8)8 | 0.77 (0.75–0.79) | 0.80 (0.79–0.81) |
| 0.70 (0.69–0.71) | 0.61 (0.44–0.71) | 0.71 (0.67–0.75) | |
| MCVAE Tab | 0.92 (0.90–0.93) | 0.87 (0.84–0.90) | 0.91 (0.90–0.92) |
| 0.83 (0.81–0.86) | 0.73 (0.68–0.79) | 0.82 (0.80–0.84) | |
| EdgeCorr-GNN | 0.92 (0.90–0.94) | 0.57 (0.51–0.65) | 0.92 (0.89–0.96) |
| 0.84 (0.82–0.86) | 0.45 (0.40–0.48) | 0.83 (0.80–0.86) | |
| AttWeighted-GNN | 0.52 (0.49–0.56) | 0.50 (0.46–0.53) | 0.51 (0.50–0.53) |
| 0.47 (0.46–0.53) | 0.45 (0.43–0.46) | 0.46 (0.43–0.49) | |
| Tab-Decision | 0.91 (0.89–0.92) | 0.93 (0.92–0.94) | 0.91 (0.90–0.94) |
| 0.85 (0.84–0.86) | 0.80 (0.80–0.81) | 0.83 (0.79–0.88) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Camastra, C.; Pelagi, A.; Quattrone, A.; Sarica, A. Benchmarking Multimodal Deep Fusion Strategies for Heterogeneous Neuroimaging and Cognitive Data Using a Controlled Sex Classification Task. Brain Sci. 2026, 16, 405. https://doi.org/10.3390/brainsci16040405
Camastra C, Pelagi A, Quattrone A, Sarica A. Benchmarking Multimodal Deep Fusion Strategies for Heterogeneous Neuroimaging and Cognitive Data Using a Controlled Sex Classification Task. Brain Sciences. 2026; 16(4):405. https://doi.org/10.3390/brainsci16040405
Chicago/Turabian StyleCamastra, Chiara, Assunta Pelagi, Andrea Quattrone, and Alessia Sarica. 2026. "Benchmarking Multimodal Deep Fusion Strategies for Heterogeneous Neuroimaging and Cognitive Data Using a Controlled Sex Classification Task" Brain Sciences 16, no. 4: 405. https://doi.org/10.3390/brainsci16040405
APA StyleCamastra, C., Pelagi, A., Quattrone, A., & Sarica, A. (2026). Benchmarking Multimodal Deep Fusion Strategies for Heterogeneous Neuroimaging and Cognitive Data Using a Controlled Sex Classification Task. Brain Sciences, 16(4), 405. https://doi.org/10.3390/brainsci16040405

