Scalable Unimodal and Multimodal Deep Learning for Multi-Label Chest Disease Detection: A Comparative Analysis
Abstract
1. Introduction
- Perform a comparative analysis by systematically processing different CNN architectures (ResNet50, EfficientNetB3, DenseNet121) in both single-modal and multi-modal architectures.
- Observe how different models are affected in a multi-label classification task on clusters of different sizes within the same dataset, without the constraint of binary classification.
- Demonstrate that performance issues are overcome with minimum loss values and pathological label-based AUC scores, minimizing gaps in the literature such as limited modality interaction, interpretability, and explainability.
- Make a significant contribution to the applicability of multi-label multi-modal deep learning in the medical field, emphasizing the importance of innovative models and clinical artificial intelligence decision support systems.
- Comprehensively evaluate the relationship between data quantity and modality integration by examining the generalization behavior of single-modal and multi-modal models across various data scales under the same architecture and training procedures.
2. Materials and Methods
2.1. Dataset and Preprocessing Steps
2.2. Unimodal and Multimodal Models Preparation on the 5606-Sample Dataset
2.2.1. Unimodal Models
2.2.2. Multimodal Models
2.3. Unimodal and Multimodal Models Preparation on the NIH Chest X-Ray Dataset
2.3.1. Unimodal Models
2.3.2. Multimodal Models
3. Results
3.1. Results of Unimodal and Multimodal Models in the 5.606 Dataset
3.1.1. Unimodal Models
3.1.2. Multimodal Models
3.2. Results of Unimodal and Multimodal Models in the NIH Chest X-Ray Dataset
3.2.1. Unimodal Models
3.2.2. Multimodal Models
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Awan, T.; Khan, K.B. Investigating the Impact of Novel XRayGAN in Feature Extraction for Thoracic Disease Detection in Chest Radiographs: Lung Cancer. Signal Image Video Process. 2024, 18, 3957–3972. [Google Scholar] [CrossRef]
- Awan, T.; Khan, K.B.; Mannan, A. A Compact CNN Model for Automated Detection of COVID-19 Using Thorax X-Ray Images. J. Intell. Fuzzy Syst. 2023, 44, 7887–7907. [Google Scholar] [CrossRef]
- Ucan, M.; Kaya, B.; Kaya, M. Generating Medical Reports With a Novel Deep Learning Architecture. Int. J. Imaging Syst. Technol. 2025, 35. [Google Scholar] [CrossRef]
- Orhan, D.; Kaya, M. Multimodal Deep Learning Based Brain Tumor Segmentation Using CT And MRI Scans. In Proceedings of the 2025 15th International Conference on Advanced Computer Information Technologies (ACIT); IEEE: New York, NY, USA, 2025; pp. 807–810. [Google Scholar]
- Al-Zoghby, A.M.; Ismail Ebada, A.; Saleh, A.S.; Abdelhay, M.; Awad, W.A. A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics. Comput. Mater. Contin. 2025, 84, 4155–4193. [Google Scholar] [CrossRef]
- Awan, T.; Khan, K.B. Analysis of Underfitting and Overfitting in U-Net Semantic Segmentation for Lung Nodule Identification from X-Ray Radiographs. In Proceedings of the 2023 IEEE International Conference on Emerging Trends in Engineering, Sciences and Technology (ICES&T); IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
- Kufel, J.; Bielówka, M.; Rojek, M.; Mitręga, A.; Lewandowski, P.; Cebula, M.; Krawczyk, D.; Bielówka, M.; Kondoł, D.; Bargieł-Łączek, K.; et al. Multi-Label Classification of Chest X-Ray Abnormalities Using Transfer Learning Techniques. J. Pers. Med. 2023, 13, 1426. [Google Scholar] [CrossRef]
- Benani, A.; Ohayon, S.; Laleye, F.; Bauvin, P.; Messas, E.; Bodard, S.; Tannier, X. Is Multimodal Better? A Systematic Review of Multimodal versus Unimodal Machine Learning in Clinical Decision-Making. medRxiv 2025. [Google Scholar] [CrossRef]
- Pei, X.; Zuo, K.; Li, Y.; Pang, Z. A Review of the Application of Multi-Modal Deep Learning in Medicine: Bibliometrics and Future Directions. Int. J. Comput. Intell. Syst. 2023, 16, 44. [Google Scholar] [CrossRef]
- Wei, T.-R.; Chang, A.; Kang, Y.; Patel, M.; Fang, Y.; Yan, Y. Multimodal Deep Learning for Enhanced Breast Cancer Diagnosis on Sonography. Comput. Biol. Med. 2025, 194, 110466. [Google Scholar] [CrossRef]
- Azam, M.A.; Khan, K.B.; Salahuddin, S.; Rehman, E.; Khan, S.A.; Khan, M.A.; Kadry, S.; Gandomi, A.H. A Review on Multimodal Medical Image Fusion: Compendious Analysis of Medical Modalities, Multimodal Databases, Fusion Techniques and Quality Metrics. Comput. Biol. Med. 2022, 144, 105253. [Google Scholar] [CrossRef] [PubMed]
- Adeel Azam, M.; Bahadar Khan, K.; Ahmad, M.; Mazzara, M. Multimodal Medical Image Registration and Fusion for Quality Enhancement. Comput. Mater. Contin. 2021, 68, 821–840. [Google Scholar] [CrossRef]
- Xu, C.; Pan, Y.; Hu, B.; Zhang, Y.; Hong, Y.; Yang, Y. Enhancing Chest X-Ray Diagnostics with Neighbor-Assisted Multimodal Integration. In Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); IEEE: New York, NY, USA, 2024; pp. 3872–3876. [Google Scholar]
- Shimbre, N.; Solanki, R.K. ChestXFusionNet: A Multimodal Deep Learning Framework for Predicting Chest Diseases from X-Ray Images and Clinical Data. EPJ Web Conf. 2025, 328, 01059. [Google Scholar] [CrossRef]
- Han, X.; Tu, E.; Yang, J. Multimodal 12-Lead ECG Data Classification Using Multi-Label DenseNet for Heart Disease Detection. In Proceedings of the 2022 5th International Conference on Data Science and Information Technology (DSIT); IEEE: New York, NY, USA, 2022; pp. 1–06. [Google Scholar]
- Ucan, M.; Kaya, B.; Aygun, O.; Kaya, M.; Alhajj, R. Comparison of EfficientNet CNN Models for Multi-Label Chest X-Ray Disease Diagnosis. PeerJ Comput. Sci. 2025, 11, e2968. [Google Scholar] [CrossRef]
- Jin, Y.; Lu, H.; Zhu, W.; Huo, W. Deep Learning Based Classification of Multi-Label Chest X-Ray Images via Dual-Weighted Metric Loss. Comput. Biol. Med. 2023, 157, 106683. [Google Scholar] [CrossRef]
- Townsell, D.; Banerjee, T.; Chen, L.; Raymer, M. Advancing Chest X-Ray Diagnostics via Multi-Modal Neural Networks with Attention. In Proceedings of the 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); IEEE: New York, NY, USA, 2024; pp. 1–4. [Google Scholar]
- Yang, L.; Wan, Y.; Pan, F. Enhancing Chest X-Ray Diagnosis with a Multimodal Deep Learning Network by Integrating Clinical History to Refine Attention. J. Imaging Inform. Med. 2025, 38, 3568–3583. [Google Scholar] [CrossRef]
- Sangeetha, S.K.B.; Mathivanan, S.K.; Karthikeyan, P.; Rajadurai, H.; Shivahare, B.D.; Mallik, S.; Qin, H. An Enhanced Multimodal Fusion Deep Learning Neural Network for Lung Cancer Classification. Syst. Soft Comput. 2024, 6, 200068. [Google Scholar] [CrossRef]
- Liang, X.; Li, X.; Li, F.; Jiang, J.; Dong, Q.; Wang, W.; Wang, K.; Dong, S.; Luo, G.; Li, S. MedFILIP: Medical Fine-Grained Language-Image Pre-Training. IEEE J. Biomed. Health Inform. 2025, 29, 3587–3597. [Google Scholar] [CrossRef]
- Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2017; Volume 2017, pp. 3462–3471. [Google Scholar]
- National Library of Medicine. Random Sample of NIH Chest X-Ray Dataset. Available online: https://www.kaggle.com/datasets/nih-chest-xrays/sample (accessed on 1 September 2025).
- Gökçimen, F.; İnner, A.B.; Çakır, Ö. Determination of Anteroposterior and Posteroanterior Imaging Positions on Chest X-Ray Images Using Deep Learning. Eng. Proc. 2025, 104, 58. [Google Scholar]
- Gupta, Y.M.; Kirana, S.N.; Homchan, S. Representing DNA for Machine Learning Algorithms: A Primer on One-hot, Binary, and Integer Encodings. Biochem. Mol. Biol. Educ. 2025, 53, 142–146. [Google Scholar] [CrossRef] [PubMed]
- Kapase·, A.B.; Uke, N. A Comprehensive Review in Affective Computing: An Exploration of Artificial Intelligence in Unimodal and Multimodal Emotion Recognition Systems. Int. J. Speech Technol. 2025, 28, 541–563. [Google Scholar] [CrossRef]
- Rath, A.; Mishra, B.S.P.; Bagal, D.K. ResNet50-Based Deep Learning Model for Accurate Brain Tumor Detection in MRI Scans. Next Res. 2025, 2, 100104. [Google Scholar] [CrossRef]
- Liang, Y.; Wang, Y.; Li, W.; Pham, D.T.; Lu, J. Adaptive Fault Diagnosis of Machining Processes Enabled by Hybrid Deep Learning and Incremental Transfer Learning. Comput. Ind. 2025, 167, 104262. [Google Scholar] [CrossRef]
- Bosso, V.d.A.; Nardini, R.M.; de Sousa, M.A.d.A.; dos Santos, S.D.; Pires, R. An Area-Efficient and Low-Error FPGA-Based Sigmoid Function Approximation. Appl. Sci. 2025, 15, 11551. [Google Scholar] [CrossRef]
- Senthil Pandi, S.; Kumar, P.; Salman Latheef, T.A.; Manjunath, T.C. A Multimodal Deep Learning Framework for Emotion Recognition in Text and Visual Media. In Proceedings of the 2025 8th International Conference on Circuit, Power & Computing Technologies (ICCPCT); IEEE: New York, NY, USA, 2025; pp. 739–744. [Google Scholar]
- Alsohemi, R.; Dardouri, S. Fundus Image-Based Eye Disease Detection Using EfficientNetB3 Architecture. J. Imaging 2025, 11, 279. [Google Scholar] [CrossRef]
- Mustahid, A.A.M.; Rahman, M.S.; Joy, M.I.K.; Ishrak, M.F.; Ahmed, N.; Muzahid, A.A.M. DenseNet-Driven Multi-Class Classification of Skin Lesions with Data Augmentation for Improved Balance. In Proceedings of the 2025 17th International Conference on Computer and Automation Engineering (ICCAE); IEEE: New York, NY, USA, 2025; pp. 38–43. [Google Scholar]
- Guarrasi, V.; Aksu, F.; Caruso, C.M.; Di Feola, F.; Rofena, A.; Ruffini, F.; Soda, P. A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical Applications. Image Vis. Comput. 2025, 158, 105509. [Google Scholar] [CrossRef]
- Balık, E.; Kaya, M. Deep Learning-Based Visual Question Answering for Medical Imaging: Insights from the PathVQA Dataset. In Proceedings of the 2024 International Conference on Decision Aid Sciences and Applications (DASA); IEEE: New York, NY, USA, 2024; pp. 1–4. [Google Scholar]
- Ochoa-Ornelas, R.; Gudiño-Ochoa, A.; García-Rodríguez, J.A.; Uribe-Toscano, S. A Robust Transfer Learning Approach with Histopathological Images for Lung and Colon Cancer Detection Using EfficientNetB3. Healthc. Anal. 2025, 7, 100391. [Google Scholar] [CrossRef]
- Qi, X.; Wen, Y.; Zhang, P.; Huang, H. MFGCN: Multimodal Fusion Graph Convolutional Network for Speech Emotion Recognition. Neurocomputing 2025, 611, 128646. [Google Scholar] [CrossRef]
- Radočaj, P.; Radočaj, D.; Martinović, G. Optimizing Convolutional Neural Network Architectures with Optimal Activation Functions for Pediatric Pneumonia Diagnosis Using Chest X-Rays. Big Data Cogn. Comput. 2025, 9, 25. [Google Scholar] [CrossRef]
- Kumar, A.; Singh, C.; Sachan, M.K. A Moment-Based Pooling Approach in Convolutional Neural Networks for Breast Cancer Histopathology Image Classification. Neural Comput. Appl. 2025, 37, 1127–1156. [Google Scholar] [CrossRef]
- Leong, W.Y.; Zhao, C.H. Optimisation Solutions and Simple Innovative Solution Research on ResNet50 Model. ASM Sci. J. 2025, 20, 1–9. [Google Scholar] [CrossRef]
- Sankari, C. Optimized Deep Learning Framework Utilizing DenseNet121 for High-Accuracy Image Classification with Improved Computational Efficiency and Feature Learning. In Proceedings of the 2025 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI); IEEE: New York, NY, USA, 2025; pp. 1–6. [Google Scholar]
- Krishnan, P.T.; Rajangam, V. Fusion Strategies for Deep Learning Applications. In Advanced Image Fusion Techniques for Medical Imaging; Springer Nature: Singapore, 2025; pp. 65–77. [Google Scholar]









| Columns of Clinical Data | Sample Data |
|---|---|
| Image Index | 00000013_005.png |
| Finding Labels | Emphysema|Infiltration|Pleural_Thickening|Pneumothorax |
| Follow up# | 005 |
| Patient ID | 00000013 |
| Patient Age | 060Y |
| Patient Gender | M |
| View Position | AP |
| OriginalImageWidth | 3056 |
| OriginalImageHeight | 0.139 |
| OriginalImagePixelSpacing_x | 2544 |
| OriginalImagePixelSpacing_y | 0.139 |
| Column | Non-Null Count | Dtype |
|---|---|---|
| Image Index | 5606 non-null | Object |
| Finding Labels | 5606 non-null | Object |
| Follow-up | 5606 non-null | Int64 |
| Patient ID | 5606 non-null | Int64 |
| PatientAge | 5606 non-null | Int64 |
| OriginalImageWidth | 5606 non-null | Int64 |
| OriginalImageHeight | 5606 non-null | Int64 |
| OriginalImagePixelSpacing_x | 5606 non-null | Float64 |
| OriginalImagePixelSpacing_y | 5606 non-null | Float64 |
| Patient Gender_M | 5606 non-null | Bool |
| View Position_AP | 5606 non-null | Bool |
| View Position_PA | 5606 non-null | Bool |
| Atelectasis | 5606 non-null | Int64 |
| Cardiomegaly | 5606 non-null | Int64 |
| Edema | 5606 non-null | Int64 |
| Effusion | 5606 non-null | Int64 |
| Emphysema | 5606 non-null | Int64 |
| Fibrosis | 5606 non-null | Int64 |
| Hernia | 5606 non-null | Int64 |
| Infiltration | 5606 non-null | Int64 |
| … | … | … |
| Pneumonia | 5606 non-null | Int64 |
| Pneumothorax | 5606 non-null | Int64 |
| Disease Labels | Random Sample of NIH Chest X-Ray | NIH Chest X-Ray 14 |
|---|---|---|
| No Finding | 3044 | 60,361 |
| Infiltration | 967 | 19,894 |
| Effusion | 644 | 13,317 |
| Atelectasis | 508 | 11,559 |
| Nodule | 313 | 6331 |
| Mass | 284 | 5782 |
| Pneumothorax | 271 | 5302 |
| Consolidation | 226 | 4667 |
| Pleural_Thickening | 176 | 3385 |
| Cardiomegaly | 141 | 2776 |
| Emphysema | 127 | 2516 |
| Edema | 118 | 2303 |
| Fibrosis | 84 | 1686 |
| Pneumonia | 62 | 1431 |
| Hernia | 13 | 227 |
| Modality | Fusion Strategy | Rate | Mean AUC | Standard Deviation |
|---|---|---|---|---|
| Unimodal Image | - | - | 0.4750 | 0.0910 |
| Multimodal | Late Fusion | (0.50–0.50) | 0.5624 | 0.0801 |
| Multimodal | Late Fusion | (0.40–0.60) | 0.5600 | 0.0798 |
| Multimodal | Late Fusion | (Max Fusion) | 0.5693 | 0.0887 |
| Multimodal | Late Fusion | (Min Fusion) | 0.4930 | 0.0954 |
| Multimodal | Feature-Level Fusion | (0.50–0.50) | 0.7106 | 0.0814 |
| Multimodal | Feature-Level Fusion | (0.40–0.60) | 0.7144 | 0.0751 |
| Multimodal | Feature-Level Fusion | (Max Fusion) | 0.7447 | 0.0813 |
| Multimodal | Feature-Level Fusion | (Min Fusion) | 0.7273 | 0.0883 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Orhan, D.; Ucan, M.; Alhajj, R.; Kaya, M. Scalable Unimodal and Multimodal Deep Learning for Multi-Label Chest Disease Detection: A Comparative Analysis. Diagnostics 2026, 16, 734. https://doi.org/10.3390/diagnostics16050734
Orhan D, Ucan M, Alhajj R, Kaya M. Scalable Unimodal and Multimodal Deep Learning for Multi-Label Chest Disease Detection: A Comparative Analysis. Diagnostics. 2026; 16(5):734. https://doi.org/10.3390/diagnostics16050734
Chicago/Turabian StyleOrhan, Diğdem, Murat Ucan, Reda Alhajj, and Mehmet Kaya. 2026. "Scalable Unimodal and Multimodal Deep Learning for Multi-Label Chest Disease Detection: A Comparative Analysis" Diagnostics 16, no. 5: 734. https://doi.org/10.3390/diagnostics16050734
APA StyleOrhan, D., Ucan, M., Alhajj, R., & Kaya, M. (2026). Scalable Unimodal and Multimodal Deep Learning for Multi-Label Chest Disease Detection: A Comparative Analysis. Diagnostics, 16(5), 734. https://doi.org/10.3390/diagnostics16050734

