Externally Validated Deep Learning Analysis of Chest Radiographs for Differentiating COVID-19 and Viral Pneumonia
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Design and Reporting Framework
- (i)
- model training and internal validation using harmonized publicly available datasets;
- (ii)
- robustness assessment via stratified patient-level cross-validation; and
- (iii)
- independent performance evaluation using a real-world institutional dataset from Adan Hospital, Kuwait.
2.2. Data Sources and Dataset Composition
2.2.1. Public Training and Validation Datasets
- (i)
- duplicate images;
- (ii)
- pediatric cases;
- (iii)
- lateral or non-frontal projections;
- (iv)
- poor diagnostic quality (e.g., motion artifact, severe under- or overexposure); or
- (v)
- missing or ambiguous class labels.
2.2.2. Institutional Test Dataset (Independent External Evaluation)
2.3. Image Preprocessing and Data Augmentation
2.4. Model Architecture
2.5. Model Training and Optimization
2.6. Cross-Validation Strategy and Reproducibility
2.7. Performance Metrics and Statistical Analysis
3. Results
3.1. Model Training Dynamics and Cross-Validation Stability
3.2. Validation Dataset Classification
3.3. Predictive Values and Probability Distributions (Validation Dataset)
3.4. Calibration and Reliability Assessment (Validation Dataset)
3.5. Decision Curve Analysis (Validation Dataset)
3.6. Performance on the Independent Institutional Test Dataset
3.7. Predictive Values and Calibration (Institutional Dataset)
3.8. Decision Curve Analysis (Institutional Dataset)
3.9. Prediction Distribution and Confidence Analysis
3.10. Precision–Recall Analysis (Institutional Dataset)
4. Discussion
Limitations and Future Directions
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| CNNs | Convolutional Neural Networks |
| CXR | Chest X-rays |
| AI-CXR.NET | Artificial Intelligence Chest X-ray Network |
| NPVs | Negative Predictive Values |
| PPVs | Positive Predictive Values |
| AP | Average Precision |
| PR | Precision–Recall |
| DCA | Decision Curve Analysis |
| RSNA | Decision Curve Analysis |
| NIH | National Institute of Health |
| PHC | Primary Health Care |
References
- Shelke, A.; Gajbhiye, M.; Kshirsagar, P. Chest X-ray classification using deep learning for automated COVID-19 screening. SN Comput. Sci. 2021, 2, 300. [Google Scholar] [CrossRef] [PubMed]
- Al Nufaiei, Z.F.; Alshamrani, K.M. Comparing Ultrasound, Chest X-Ray, and CT Scan for Pneumonia Detection. Med. Devices 2025, 18, 149–159. [Google Scholar] [CrossRef] [PubMed]
- Khan, E.; Rehman, M.Z.U.; Ahmed, F.; Alfouzan, F.A.; Alzahrani, N.M.; Ahmad, J. Chest X-ray classification for the detection of COVID-19 using deep learning techniques. Sensors 2022, 22, 1211. [Google Scholar] [CrossRef] [PubMed]
- Central Statistical Bureau (CSB). Demographic Indicators 2023; CSB: Kuwait City, Kuwait, 2023. Available online: https://www.csb.gov.kw (accessed on 10 September 2025).
- Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef]
- Saxena, A.; Singh, S.P. A deep learning approach for the detection of COVID-19 from chest X-ray images using convolutional neural networks. arXiv 2022, arXiv:2201.09952. [Google Scholar] [CrossRef]
- Gouda, W.; Almurafeh, M.; Humayun, M.; Jhanjhi, N.Z. Detection of COVID-19 based on chest X-rays using deep learning. Healthcare 2022, 10, 343. [Google Scholar] [CrossRef]
- Chowdhury, M.E.H.; Rahman, T.; Khandakar, A.; Mazhar, R.; Kadir, M.A.; Bin Mahbub, Z.; Islam, K.R.; Khan, M.S.; Iqbal, A.; Al Emadi, N.; et al. Can AI help in screening viral and COVID-19 pneumonia? IEEE Access 2020, 8, 132665–132676. [Google Scholar] [CrossRef]
- Rahman, T.; Khandakar, A.; Qiblawey, Y.; Tahir, A.; Kiranyaz, S.; Kashem, S.B.A.; Islam, M.T.; Al Maadeed, S.; Zughaier, S.M.; Khan, M.S.; et al. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput. Biol. Med. 2021, 132, 104319. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Las Vegas, NV, USA, 2016; pp. 770–778. Available online: https://arxiv.org/abs/1512.03385 (accessed on 15 July 2025).
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A.; Liu, W.; et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Boston, MA, USA, 2015; Available online: https://arxiv.org/abs/1409.4842 (accessed on 15 July 2025).
- Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Hasan, M.; Van Essen, B.C.; Awwal, A.A.S.; Asari, V.K. A State-of-the-Art Survey on Deep Learning Theory and Architectures. Electronics 2019, 8, 292. [Google Scholar] [CrossRef]
- Karhan, Z.; Kal, F.A. COVID-19 classification using deep learning in chest X-ray images. In Proceedings of the Medical Technologies Congress (TIPTEKNO); IEEE: Antalya, Turkey, 2020. [Google Scholar] [CrossRef]
- Asif, S.; Wenhui, Y.; Hou, J.; Jinhai, S. Classification of COVID-19 from chest X-ray images using deep convolutional neural networks. In Proceedings of the IEEE 6th International Conference on Computer and Communications (ICCC); IEEE: Chengdu, China, 2020. [Google Scholar] [CrossRef]
- Pala, M.A.; Navdar, M.B. SPX-GNN: An Explainable Graph Neural Network for Harnessing Long-Range Dependencies in Tuberculosis Classifications in Chest X-Ray Images. Diagnostics 2025, 15, 3236. [Google Scholar] [CrossRef] [PubMed]
- Anderson, P.G.; Tarder-Stoll, H.; Alpaslan, M.; Keathley, N.; Levin, D.L.; Venkatesh, S.; Bartel, E.; Sicular, S.; Howell, S.; Lindsey, R.V.; et al. Deep learning improves physician accuracy in the comprehensive detection of abnormalities on chest X-rays. Sci. Rep. 2024, 14, 25151. [Google Scholar] [CrossRef] [PubMed]
- Caliman Sturdza, O.A. Deep Learning Network Selection and Optimized Information Fusion for Enhanced COVID-19 Detection: A Literature Review. Diagnostics 2025, 15, 1830. [Google Scholar] [CrossRef] [PubMed]
- Kumar, A.; Patel, P.; Robert, D.; Kumar, S.; Khetani, A.; Reddy, B.; Srivastava, A. Accuracy of an artificial intelligence-enabled diagnostic assistance device in recognizing normal chest radiographs: A service evaluation. BJR Open 2024, 6, tzae029. [Google Scholar] [CrossRef] [PubMed]
- Wang, G.; Liu, X.; Shen, J. A deep-learning pipeline for the diagnosis and discrimination of viral, non-viral and COVID-19 pneumonia from chest X-ray images. Nat. Biomed. Eng. 2021, 5, 509–521. [Google Scholar] [CrossRef]
- Wong, H.Y.F.; Lam, H.Y.S.; Fong, A.H.T.; Leung, S.T.; Chin, T.W.-Y.; Lo, C.S.Y.; Lui, M.M.-S.; Lee, J.C.Y.; Chiu, K.W.-H.; Chung, T.W.-H.; et al. Frequency and distribution of chest radiographic findings in COVID-19-positive patients. Radiology 2020, 296, E72–E78. [Google Scholar] [CrossRef]
- Oakden-Rayner, L. Exploring large-scale public medical image datasets. Acad. Radiol. 2020, 27, 106–112. [Google Scholar] [CrossRef]
- Maguolo, G.; Nanni, L. A critical evaluation of methods for COVID-19 automatic detection from X-ray images. Inf. Fusion 2021, 76, 1–7. [Google Scholar] [CrossRef]
- Roberts, M.; Driggs, D.; Thorpe, M.; Gilbey, J.; Yeung, M.; Ursprung, S.; Aviles-Rivero, A.I.; Etmann, C.; McCague, C.; Beer, L.L. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 2021, 3, 199–217. [Google Scholar] [CrossRef]
- DeGrave, A.J.; Janizek, J.D.; Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 2021, 3, 610–619. [Google Scholar] [CrossRef]
- Geirhos, R.; Jacobsen, J.-H.; Michaelis, C.; Zemel, R.; Brendel, W.; Bethge, M.; Wichmann, F.A. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2020, 2, 665–673. [Google Scholar] [CrossRef]
- Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G.M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. Ann. Intern. Med. 2015, 162, 55–63. [Google Scholar] [CrossRef]
- Collins, G.S.; Moons, K.G.M.; Dhiman, P.; Riley, R.D.; Beam, A.L.; Van Calster, B.; Ghassemi, M.; Liu, X.; Reitsma, J.B.; van Smeden, M.; et al. TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024, 385, e078378. [Google Scholar] [CrossRef] [PubMed]
- Moons, K.G.; Damen, J.A.; Kaul, T.; Hooft, L.; Navarro, C.A.; Dhiman, P.; Beam, A.L.; Van Calster, B.; Celi, L.A.; Denaxas, S.; et al. PROBAST+AI: An updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ 2025, 388, e082505. [Google Scholar] [CrossRef] [PubMed]
- Rahman, T.; Khandakar, A.; Qiblawey, Y.; Tahir, A.; Kiranyaz, S.; Abul Kashem, S.B.; Islam, M.T.; Al Maadeed, S.; Zughaier, S.M.; Khan, M.S.; et al. COVID-19 Radiography Database. Kaggle. 2021. Available online: https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database (accessed on 5 August 2025).
- Cohen, J.P.; Morrison, P.; Dao, L.; Roth, K.; Duong, T.; Ghassem, M. COVID-19 image data collection: Prospective predictions are the future. arXiv 2020, arXiv:2006.11988. [Google Scholar] [CrossRef]
- Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. ChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly supervised classification and localization of common thoracic diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Honolulu, HI, USA, 2017; pp. 3462–3471. [Google Scholar]
- Radiological Society of North America (RSNA). RSNA Pneumonia Detection Challenge. Kaggle. 2018. Available online: https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge (accessed on 5 August 2025).
- Vickers, A.J.; Elkin, E.B. Decision curve analysis: A novel method for evaluating prediction models. Med. Decis. Mak. 2006, 26, 565–574. [Google Scholar] [CrossRef]
- Murphy, K.; Smits, H.; Knoops, A.J.G.; Korst, M.B.J.M.; Samson, T.; Scholten, E.T.; Schalekamp, S.; Schaefer-Prokop, C.M.; Philipsen, R.H.H.M.; Meijers, A.; et al. COVID-19 on Chest Radiographs: A Multireader Evaluation of an Artificial Intelligence System. Radiology 2020, 299, E193–E202. [Google Scholar] [CrossRef]
- Cohen, J.P.; Dao, L.; Roth, K.; Morrison, P.; Bengio, Y.; Abbasi, A.F.; Shen, B.; Mahsa, H.K.; Ghassemi, M.; Li, H.; et al. Predicting COVID-19 pneumonia severity on chest X-ray with deep learning. Cureus 2020, 12, e9448. [Google Scholar] [CrossRef]










| Actual\Predicted | Normal | Viral Pneumonia | COVID-19 | Total |
|---|---|---|---|---|
| Normal | 247 | 2 | 1 | 250 |
| Viral pneumonia | 7 | 227 | 1 | 235 |
| COVID-19 | 5 | 11 | 346 | 362 |
| Total | 259 | 240 | 348 | 847 |
| Class | Precision % | Recall | Specificity | F1-Score % | Support |
|---|---|---|---|---|---|
| Normal | 95.37 (92.1–97.3) | 98.8 (96.5–99.6) | 97.9 (96.5–98.8) | 97.1 | 250 |
| Viral pneumonia | 94.6 (90.9–96.8) | 96.6 (93.4–98.3) | 97.88 (96.4–98.7) | 95.6 | 235 |
| COVID-19 | 99.43 (97.9–99.8) | 95.58 (92.94–97.3) | 99.59 (98.5–99.9) | 97.5 | 362 |
| Weighted average | 96.9 | 96.8 | 98.6 | 96.8 | |
| Macro average | 96.4 | 96.9 | 98.5 | 96.7 |
| Actual\Predicted | Normal | Viral Pneumonia | COVID-19 | Total |
|---|---|---|---|---|
| Normal | 164 | 7 | 1 | 172 |
| Viral pneumonia | 2 | 52 | 5 | 59 |
| COVID-19 | 3 | 2 | 84 | 89 |
| Total | 169 | 61 | 90 | 320 |
| Class | Precision % | Recall % | Specificity % | F1-Score % | Support |
|---|---|---|---|---|---|
| Normal | 97.0 (93.3–98.7) | 95.6 (91.1–97.6) | 96.6 (92.3–98.6) | 96.2 | 172 |
| Viral pneumonia | 85.6 (74.3–92.0) | 88.1 (77.5–94.1) | 96.6 (93.6–98.2) | 86.7 | 59 |
| COVID-19 | 93.3 (86.2–96.9) | 94.4 (87.5–97.6) | 97.4 (94.5–98.8) | 93.8 | 89 |
| Weighted average | 93.8 | 93.7 | 96.8 | 93.8 | |
| Macro average | 91.9 | 92.6 | 96.8 | 92.2 |
| Class | PPV % | NPV % |
|---|---|---|
| Normal | 97.0 (93.3–98.7) | 94.7 (90.9–97.2) |
| Viral pneumonia | 85.2 (74.3–92.0) | 97.3 (94.6–98.9) |
| COVID-19 | 93.3 (86.2–96.9) | 97.8 (95.1–99.1) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Masoomi, M.; Al-Kandari, L.; Ramzy, H.; Hamza, M.A. Externally Validated Deep Learning Analysis of Chest Radiographs for Differentiating COVID-19 and Viral Pneumonia. Diagnostics 2026, 16, 995. https://doi.org/10.3390/diagnostics16070995
Masoomi M, Al-Kandari L, Ramzy H, Hamza MA. Externally Validated Deep Learning Analysis of Chest Radiographs for Differentiating COVID-19 and Viral Pneumonia. Diagnostics. 2026; 16(7):995. https://doi.org/10.3390/diagnostics16070995
Chicago/Turabian StyleMasoomi, Michael, Latifa Al-Kandari, Haytam Ramzy, and Mahday Abass Hamza. 2026. "Externally Validated Deep Learning Analysis of Chest Radiographs for Differentiating COVID-19 and Viral Pneumonia" Diagnostics 16, no. 7: 995. https://doi.org/10.3390/diagnostics16070995
APA StyleMasoomi, M., Al-Kandari, L., Ramzy, H., & Hamza, M. A. (2026). Externally Validated Deep Learning Analysis of Chest Radiographs for Differentiating COVID-19 and Viral Pneumonia. Diagnostics, 16(7), 995. https://doi.org/10.3390/diagnostics16070995

