Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach
Abstract
1. Introduction
- Development of a novel transformer structure that significantly reduces complexity compared to traditional transformer-based models while maintaining high performance;
- Introduction of a novel TransUNet architecture for the segmentation task, achieving a Dice score of 95.68% on the “Chest X-ray Masks and Labels” dataset;
- Introducing a convolutional Residual Attention Module (CRAM) that enriches feature representation by integrating multi-layer residual learning with lightweight attention mechanisms;
- Incorporation of multi-scale feature extraction, enabling enhanced performance through the utilization of multiple feature spaces;
- Achieving high accuracy rates of 93.75% on the “Kermany” dataset and 96.04% on the “Cohen” dataset.
2. Related Work
2.1. Segmentation
2.1.1. U-Net for CXR Segmentation
2.1.2. U-Net Enhancements with Transformers
2.2. Classification
2.2.1. Classical Approaches for CXR Classification
2.2.2. Deep Learning Models
2.2.3. Transfer Learning
2.2.4. Ensemble Approaches
2.2.5. Transformers
3. Proposed Method
3.1. Overview
3.2. Segmentation Task
3.2.1. TransUNet Architecture
3.2.2. Training the TransUNet Model
3.2.3. Applying the Trained TransUNet to Cohen/Kermany Datasets
3.3. Classification Task
3.3.1. Backbone
3.3.2. Convolutional Residual Attention Module (CRAM)
3.3.3. Transformer
- Global Average Pooling (GAP)
- Reshaping for Key and Value
- Attention Mechanism
- Scaled Dot-Product Attention
3.3.4. Output Feature Vector
3.3.5. Find Correct Class
3.3.6. Loss Function
4. Experimental Result
4.1. Dataset
4.2. Data Augmentation
4.3. Experimental Setting
4.4. Evaluation Metrics
4.5. Comparison with State-of-the-Art
4.5.1. Segmentation
4.5.2. Classification
4.6. Ablation Study
4.7. Explainable AI Through Gradient-Weighted Class Activation Mapping
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- UNICEF. Pneumonia. Available online: https://data.unicef.org/topic/child-health/pneumonia/ (accessed on 15 November 2023).
- World Health Organization. Pneumonia in Children. Available online: https://www.who.int/news-room/fact-sheets/detail/pneumonia (accessed on 8 November 2022).
- Action, H.P. Key Facts: Poverty and Poor Health. Available online: https://www.healthpovertyaction.org/news-events/key-facts-poverty-and-poor-health/ (accessed on 15 January 2018).
- RadiologyInfo. Chest X-Ray. Available online: https://www.radiologyinfo.org/en/info/chestrad (accessed on 17 November 2022).
- Lin, M.; Hou, B.; Mishra, S.; Yao, T.; Huo, Y.; Yang, Q.; Wang, F.; Shih, G.; Peng, Y. Enhancing thoracic disease detection using chest X-rays from PubMed Central Open Access. Comput. Biol. Med. 2023, 159, 106962. [Google Scholar] [CrossRef]
- Askari, F.; Fateh, A.; Mohammadi, M.R. Enhancing few-shot image classification through learnable multi-scale embedding and attention mechanisms. Neural Netw. 2025, 187, 107339. [Google Scholar] [CrossRef] [PubMed]
- Fateh, A.; Mohammadi, M.R.; Motlagh, M.R.J. MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping. Image Vis. Comput. 2025, 162, 105672. [Google Scholar] [CrossRef]
- Song, Z.; Wu, W.; Wu, S. Multi-Scale Convolutional Attention and Structural Re-Parameterized Residual-Based 3D U-Net for Liver and Liver Tumor Segmentation from CT. Sensors 2025, 25, 1814. [Google Scholar] [CrossRef] [PubMed]
- Junia, R.C.; Selvan, K. Deep learning-based automatic segmentation of COVID-19 in chest X-ray images using ensemble neural net sentinel algorithm. Meas. Sens. 2024, 33, 101117. [Google Scholar] [CrossRef]
- Shavkatovich Buriboev, A.; Abduvaitov, A.; Jeon, H.S. Binary Classification of Pneumonia in Chest X-Ray Images Using Modified Contrast-Limited Adaptive Histogram Equalization Algorithm. Sensors 2025, 25, 3976. [Google Scholar] [CrossRef]
- Buriboev, A.S.; Muhamediyeva, D.; Primova, H.; Sultanov, D.; Tashev, K.; Jeon, H.S. Concatenated CNN-based pneumonia detection using a fuzzy-enhanced dataset. Sensors 2024, 24, 6750. [Google Scholar] [CrossRef]
- Kanwal, K.; Asif, M.; Khalid, S.G.; Liu, H.; Qurashi, A.G.; Abdullah, S. Current diagnostic techniques for pneumonia: A scoping review. Sensors 2024, 24, 4291. [Google Scholar] [CrossRef]
- Candemir, S.; Jaeger, S.; Palaniappan, K.; Musco, J.P.; Singh, R.K.; Xue, Z.; Karargyris, A.; Antani, S.; Thoma, G.; McDonald, C.J. Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration. IEEE Trans. Med. Imaging 2013, 33, 577–590. [Google Scholar] [CrossRef]
- Jaeger, S.; Karargyris, A.; Candemir, S.; Folio, L.; Siegelman, J.; Callaghan, F.; Xue, Z.; Palaniappan, K.; Singh, R.K.; Antani, S.; et al. Automatic tuberculosis screening using chest radiographs. IIEEE Trans. Med. Imaging 2013, 33, 233–245. [Google Scholar] [CrossRef]
- Kermany, D.; Zhang, K.; Goldbaum, M. Labeled optical coherence tomography (oct) and chest x-ray images for classification. Mendeley Data 2018, 2, 651. [Google Scholar]
- Cohen, J.P.; Morrison, P.; Dao, L. COVID-19 image data collection. arXiv 2020, arXiv:2003.11597. [Google Scholar]
- Jennifer, J.S.; Sharmila, T.S. A neutrosophic set approach on chest X-rays for automatic lung infection detection. Inf. Technol. Control. 2023, 52, 37–52. [Google Scholar] [CrossRef]
- Fateh, A.; Rezvani, M.; Tajary, A.; Fateh, M. Persian printed text line detection based on font size. Multimed. Tools Appl. 2023, 82, 2393–2418. [Google Scholar] [CrossRef]
- Fateh, A.; Rezvani, M.; Tajary, A.; Fateh, M. Providing a voting-based method for combining deep neural network outputs to layout analysis of printed documents. J. Mach. Vis. Image Process. 2022, 9, 47–64. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Liu, W.; Luo, J.; Yang, Y.; Wang, W.; Deng, J.; Yu, L. Automatic lung segmentation in chest X-ray images using improved U-Net. Sci. Rep. 2022, 12, 8649. [Google Scholar] [CrossRef]
- Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–9 May 2020; pp. 1055–1059. [Google Scholar]
- Jha, D.; Riegler, M.A.; Johansen, D.; Halvorsen, P.; Johansen, H.D. Doubleu-net: A deep convolutional neural network for medical image segmentation. In Proceedings of the 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, NY, USA, 28–30 July 2020; pp. 558–564. [Google Scholar]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, T.; Tang, H.; Zhao, L.; Zhang, X.; Tan, T.; Gao, Q.; Du, M.; Tong, T. CoTrFuse: A novel framework by fusing CNN and transformer for medical image segmentation. Phys. Med. Biol. 2023, 68, 175027. [Google Scholar] [CrossRef] [PubMed]
- Stokes, K.; Castaldo, R.; Franzese, M.; Salvatore, M.; Fico, G.; Pokvic, L.G.; Badnjevic, A.; Pecchia, L. A machine learning model for supporting symptom-based referral and diagnosis of bronchitis and pneumonia in limited resource settings. Biocybern. Biomed. Eng. 2021, 41, 1288–1302. [Google Scholar] [CrossRef]
- Chandra, T.B.; Verma, K. Pneumonia detection on chest x-ray using machine learning paradigm. In Proceedings of the 3rd International Conference on Computer Vision and Image Processing, Jabalpur, India, 29 September–1 October 2018; pp. 21–33. [Google Scholar]
- Wang, Y.; Liu, Z.L.; Yang, H.; Li, R.; Liao, S.J.; Huang, Y.; Peng, M.H.; Liu, X.; Si, G.Y.; He, Q.Z.; et al. Prediction of viral pneumonia based on machine learning models analyzing pulmonary inflammation index scores. Comput. Biol. Med. 2024, 169, 107905. [Google Scholar] [CrossRef] [PubMed]
- Fateh, A.; Fateh, M.; Abolghasemi, V. Multilingual handwritten numeral recognition using a robust deep network joint with transfer learning. Inf. Sci. 2021, 581, 479–494. [Google Scholar] [CrossRef]
- Allioui, H.; Mohammed, M.A.; Benameur, N.; Al-Khateeb, B.; Abdulkareem, K.H.; Garcia-Zapirain, B.; Damaševičius, R.; Maskeliūnas, R. A multi-agent deep reinforcement learning approach for enhancement of COVID-19 CT image segmentation. J. Pers. Med. 2022, 12, 309. [Google Scholar] [CrossRef]
- Stephen, O.; Sain, M.; Maduh, U.J.; Jeong, D.U. An efficient deep learning approach to pneumonia classification in healthcare. J. Healthc. Eng. 2019, 2019, 4180949. [Google Scholar] [CrossRef] [PubMed]
- Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.; Shpanskaya, K.; et al. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv 2017, arXiv:1711.05225. [Google Scholar]
- Ukwuoma, C.C.; Qin, Z.; Heyat, M.B.B.; Akhtar, F.; Bamisile, O.; Muaad, A.Y.; Addo, D.; Al-Antari, M.A. A hybrid explainable ensemble transformer encoder for pneumonia identification from chest X-ray images. J. Adv. Res. 2023, 48, 191–211. [Google Scholar] [CrossRef]
- Jaiswal, A.K.; Tiwari, P.; Kumar, S.; Gupta, D.; Khanna, A.; Rodrigues, J.J. Identifying pneumonia in chest X-rays: A deep learning approach. Meas. 2019, 145, 511–518. [Google Scholar] [CrossRef]
- Gabruseva, T.; Poplavskiy, D.; Kalinin, A. Deep learning for automatic pneumonia detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 350–351. [Google Scholar]
- Gholami, M.; Fateh, M.; Fateh, A. Text-Enhanced Semantic Segmentation via Contrastive Language-Image Pretraining Guided Multi-Modal Feature Fusion with Feature Refinement Approach. Int. J. Eng. 2026, 39, 1422–1437. [Google Scholar] [CrossRef]
- Wang, X.; Yang, S.; Zhang, J.; Wang, M.; Zhang, J.; Huang, J.; Yang, W.; Han, X. Transpath: Transformer-based self-supervised learning for histopathological image classification. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; pp. 186–195. [Google Scholar]
- Wu, P.; Chen, J.; Wu, Y. Swin Transformer based benign and malignant pulmonary nodule classification. In Proceedings of the 5th International Conference on Computer Information Science and Application Technology (CISAT 2022), Chongqing, China, 29–31 July 2020; pp. 552–558. [Google Scholar]
- Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Los Angeles, CA, USA, 18–24 June 2022; pp. 12009–12019. [Google Scholar]
- Mishra, V.R.; Malhotra, R. Empowering healthcare with Swin Transformer V2: Advancing pneumonia diagnosis through deep learning. In Proceedings of the 4th International Conference on Computational Methods in Science & Technology (ICCMST 2024), Mohali, India, 2–3 May 2024; pp. 99–105. [Google Scholar]
- Angara, S.; Mannuru, N.R.; Mannuru, A.; Thirunagaru, S. A novel method to enhance pneumonia detection via a model-level ensembling of CNN and vision transformer. arXiv 2024, arXiv:2401.02358. [Google Scholar] [CrossRef]
- Mustapha, B.; Zhou, Y.; Shan, C.; Xiao, Z. Enhanced Pneumonia Detection in Chest X-Rays Using Hybrid Convolutional and Vision Transformer Networks. Curr. Med. Imaging 2025, 21, e15734056326685. [Google Scholar] [CrossRef]
- Anbalagan, T.; Nath, M.K.; Vijayalakshmi, D.; Anbalagan, A. Analysis of various techniques for ECG signal in healthcare, past, present, and future. Biomed. Eng. Adv. 2023, 6, 100089. [Google Scholar] [CrossRef]
- Alom, M.Z.; Yakopcic, C.; Taha, T.M.; Asari, V.K. Nuclei segmentation with recurrent residual convolutional neural networks based U-Net (R2U-Net). In Proceedings of the NAECON 2018-IEEE National Aerospace and Electronics Conference, Dayton, OH, USA, 23–26 July 2018; pp. 228–233. [Google Scholar]
- Lau, S.L.; Chong, E.K.; Yang, X.; Wang, X. Automated pavement crack segmentation using u-net-based convolutional neural network. IEEE Access 2020, 8, 114892–114899. [Google Scholar] [CrossRef]
- Azad, R.; Asadi-Aghbolaghi, M.; Fathy, M.; Escalera, S. Bi-directional ConvLSTM U-Net with densley connected convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Jalali, Y.; Fateh, M.; Rezvani, M.; Abolghasemi, V.; Anisi, M.H. ResBCDU-Net: A deep learning framework for lung CT image segmentation. Sensors 2021, 21, 268. [Google Scholar] [CrossRef]
- Zhang, Y.; Davison, B.D.; Talghader, V.W.; Chen, Z.; Xiao, Z.; Kunkel, G.J. Automatic head overcoat thickness measure with NASNet-large-decoder net. In Proceedings of the Future Technologies Conference (FTC) 2021, Vancouver, BC, Canada, 28–29 October 2021; pp. 159–176. [Google Scholar]
- Jalali, Y.; Fateh, M.; Rezvani, M. DABT-U-Net: Dual Attentive BConvLSTM U-Net with Transformers and Collaborative Patch-based Approach for Accurate Retinal Vessel Segmentation. Int. J. Eng. 2024, 37, 2051–2065. [Google Scholar] [CrossRef]
- Rezvani, S.; Fateh, M.; Khosravi, H. ABANet: Attention boundary-aware network for image segmentation. Expert Syst. 2024, 41, e13625. [Google Scholar] [CrossRef]
- Rezvani, S.; Fateh, M.; Jalali, Y.; Fateh, A. FusionLungNet: Multi-scale fusion convolution with refinement network for lung CT image segmentation. Biomed. Signal Process. Control. 2025, 107, 107858. [Google Scholar] [CrossRef]
- Zhao, A.; Wu, H.; Chen, M.; Wang, N. DCACorrCapsNet: A deep channel-attention correlative capsule network for COVID-19 detection based on multi-source medical images. IET Image Process. 2023, 17, 988–1000. [Google Scholar] [CrossRef]
- Jiang, Z.; Chen, L. Multisemantic level patch merger vision transformer for diagnosis of pneumonia. Comput. Math. Methods Med. 2022, 2022, 7852958. [Google Scholar] [CrossRef]
- Mabrouk, A.; Diaz Redondo, R.P.; Dahou, A.; Abd Elaziz, M.; Kayed, M. Pneumonia detection on chest X-ray images using ensemble of deep convolutional neural networks. Appl. Sci. 2022, 12, 6448. [Google Scholar] [CrossRef]
- Goodwin, B.D.; Jaskolski, C.; Zhong, C.; Asmani, H. Intra-model variability in COVID-19 classification using chest x-ray images. arXiv 2020, arXiv:2005.02167. [Google Scholar]
- Gazda, M.; Plavka, J.; Gazda, J.; Drotar, P. Self-supervised deep convolutional neural network for chest X-ray classification. IEEE Access 2021, 9, 151972–151982. [Google Scholar] [CrossRef]
- Van, M.H.; Verma, P.; Wu, X. On large visual language models for medical imaging analysis: An empirical study. In Proceedings of the 2024 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), Wilmington, DE, USA, 19–21 June 2024; pp. 172–176. [Google Scholar]
- Yadav, S.S.; Jadhav, S.M. Deep convolutional neural network based medical image classification for disease diagnosis. J. Big Data 2019, 6, 1–18. [Google Scholar] [CrossRef]
- Ayan, E.; Ünver, H.M. Diagnosis of pneumonia from chest X-ray images using deep learning. In Proceedings of the 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), Istanbul, Turkey, 24–26 April 2019; pp. 1–5. [Google Scholar]
- Chattopadhyay, S.; Ganguly, S.; Chaudhury, S.; Nag, S.; Chattopadhyay, S. Exploring Self-Supervised Representation Learning for Low-Resource Medical Image Analysis. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 1440–1444. [Google Scholar]
- Bhatt, H.; Shah, M. A Convolutional Neural Network ensemble model for Pneumonia Detection using chest X-ray images. Healthc. Anal. 2023, 3, 100176. [Google Scholar] [CrossRef]
- Reshan, M.S.A.; Gill, K.S.; Anand, V.; Gupta, S.; Alshahrani, H.; Sulaiman, A.; Shaikh, A. Detection of pneumonia from chest X-ray images utilizing mobilenet model. Healthcare 2023, 11, 1561. [Google Scholar] [CrossRef]
- Zhang, S.; Fu, Y.; Fang, L.; Xu, Q.; Gu, S.; Zhou, H.; Zhou, J. Psittacosis pneumonia with the reversed halo sign: A case report and literature review. BMC Infect. Dis. 2025, 25, 717. [Google Scholar] [CrossRef]
- Wahyudi, F.; Saputra, R.D. Hydropneumothorax Secondary to Community-Acquired Pneumonia with Broncholith: A Case Report. Biosci. Med. J. Biomed. Transl. Res. 2025, 9, 6005–6018. [Google Scholar]









| Model | Dice | Accuracy | Precision | Recall | F1-Score | MCC | Parameter |
|---|---|---|---|---|---|---|---|
| Unet [20] | 93.46% | 96.88% | 97.41% | 89.83% | 93.46% | 91.55% | 31.2 M |
| RU-Net [45] | 92.07% | 96.34% | 99.58% | 85.61% | 92.07% | 90.13% | 44.2 M |
| ResNet34-Unet [46] | 93.83% | 97.06% | 98.13% | 89.89% | 93.83% | 92.06% | 28.5 M |
| BCDU-Net [47] | 94.14% | 97.20% | 98.25% | 90.37% | 94.14% | 92.44% | 65.2 M |
| ResBCDUnet [48] | 94.34% | 97.31% | 98.89% | 90.20% | 94.34% | 92.75% | - |
| NasNet [49] | 94.95% | 97.52% | 96.55% | 93.42% | 94.95% | 93.33% | - |
| DABT-U-Net [50] | 95.11% | 97.64% | 98.25% | 92.16% | 95.11% | 93.64% | - |
| ABANet [51] | 95.25% | 97.71% | 98.53% | 92.18% | 95.25% | 93.84% | - |
| FusionLungNet [52] | 95.29% | 97.73% | 98.66% | 92.14% | 95.29% | 93.89% | 48.1 M |
| Our | 95.7% ± 0.5 | 97.9% ± 0.5 | 97.5% ± 0.6 | 93.9% ± 0.6 | 95.7% ± 0.5 | 94.3% ± 0.6 | 43.3 M |
| Models | Acuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Densenet121 * | 87.8% | 53.9% | 71.0% | 61.27% |
| Densenet169 * | 87.1% | 32.3% | 65.6% | 43.28% |
| Densenet201 * | 88.4% | 51.9% | 79.0% | 62.64% |
| Mobilenet_v2 * | 86.9% | 33.4% | 75.0% | 46.21% |
| ResNet-50 * | 87.1% | 38.4% | 71.0% | 49.84% |
| ResNet-101 * | 87.9% | 33.5% | 73.0% | 45.92% |
| Swin Transformer V2 [40] | 92.55% | 93.79% | 88.46% | 90.82% |
| Goodwinet al. (Ensemble learning) [56] | 89.4% | 53.3% | 80.0% | 63.97% |
| Gadza et al. [57] | 84.9% | 77.4% | 90.6% | 83.48% |
| Zhao et al. (Channel-Attention Capsule) [53] | 90.43% | 90.81% | 90.43% | 90.40% |
| CNN-based [58] | 92.52% | - | - | - |
| CNN-based [58] | 91.05% | - | - | - |
| Proposed method (ResNet-50 as backbone) | 96.04% ± 0.5 | 96.70% ± 0.5 | 94.90% ± 0.6 | 95.77% ± 0.5 |
| Proposed method (ResNet-101 as backbone) | 95.19% ± 0.6 | 95.59% ± 0.5 | 94.19% ± 0.6 | 94.86% ± 0.6 |
| Models | Acuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Yadav et al. (VGG16 as backbone) [59] | 88.50% | - | - | - |
| Ayan et al. (VGG16 as backbone) [60] | 87.98% | 82.72% | 85.90% | 84.28% |
| Chattopadhyay et al. [61] | 81.7% | - | - | 80.6% |
| Bhatt et al. (CNN) [62] | 85.58% | 83.33% | 96.15% | 89.29% |
| Reshan et al. (MobileNet as backbone) [63] | 90.85% | 91.41% | 95.28% | 91.41% |
| Reshan et al. (ResNet152B2 as backbone) [63] | 84.65% | 82.38% | 99.21% | 90.02% |
| Reshan et al. (DenseNet121 as backbone) [63] | 88.90% | 88.33% | 96.87% | 92.41% |
| Reshan et al. (Xception as backbone) [63] | 87.59% | 91.75% | 90.32% | 91.03% |
| Reshan et al. (EfficientNet as backbone) [63] | 51.02% | 86.21% | 45.85% | 90.10% |
| Jiang et al. (MP-ViT) [54] | 91.19% | 91.82% | 89.36% | 90.34% |
| ViT in [55] | 92.45% | 92.47% | 92.44% | 92.47% |
| Proposed method (ResNet-50 as backbone) | 91.67% ± 0.6 | 92.04% ± 0.5 | 94.87% ± 0.5 | 93.43% ± 0.6 |
| Proposed method (ResNet-101 as backbone) | 93.75% ± 0.5 | 93.98% ± 0.5 | 96.16% ± 0.5 | 95.05% ± 0.5 |
| Backbones | Results of Proposed Method | Acuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| ResNet-50 | on original images | 91.23% | 90.94% | 85.60% | 88.19% |
| on predicted masks | 96.04% | 96.70% | 94.90% | 95.77% | |
| ResNet-101 | on original images | 90.22% | 88.16% | 87.04% | 87.6% |
| on predicted masks | 95.19% | 95.59% | 94.19% | 94.86% |
| Backbones | Baseline | Multi-Scale Feature Maps | CRAM + Transformer | Accuracy | Precision | Recall | F1-Score | Training Time (per Image) | Inference Time | Learnable Parameters |
|---|---|---|---|---|---|---|---|---|---|---|
| ResNet-50 | X | 84.62% ± 0.04 | 75.68% ± 0.05 | 75.10% ± 0.06 | 75.39% ± 0.04 | 2.7 ms | 0.8 ms | 0.65 M | ||
| X | X | 87.73% ± 0.03 | 80.55% ± 0.04 | 79.62% ± 0.05 | 80.08% ± 0.04 | 2.8 ms | 0.8 ms | 0.85 M | ||
| X | X | 87.73% ± 0.03 | 80.63% ± 0.04 | 80.47% ± 0.03 | 80.55% ± 0.03 | 3.5 ms | 1.2 ms | 1.15 M | ||
| X | X | X | 91.23% ± 0.02 | 90.94% ± 0.03 | 85.60% ± 0.03 | 88.19% ± 0.03 | 12.2 ms | 5.4 ms | 2.29 M | |
| ResNet-101 | X | 83.93% ± 0.04 | 74.21% ± 0.05 | 73.92% ± 0.04 | 74.06% ± 0.04 | 9.7 ms | 2.9 ms | 0.65 M | ||
| X | X | 85.56% ± 0.03 | 78.21% ± 0.04 | 80.40% ± 0.05 | 79.29% ± 0.04 | 12.3 ms | 3.6 ms | 0.85 M | ||
| X | X | 85.71% ± 0.03 | 78.59% ± 0.04 | 81.11% ± 0.04 | 79.83% ± 0.03 | 14.6 ms | 5.3 ms | 1.15 M | ||
| X | X | X | 90.22% ± 0.03 | 88.16% ± 0.03 | 87.04% ± 0.03 | 87.60% ± 0.03 | 43.7 ms | 19.1 ms | 2.29 M |
| Backbones | Block 2 | Block 3 | Block 4 | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|---|---|
| ResNet-50 | X | 84.62% | 75.68% | 75.10% | 75.39% | ||
| X | X | 85.86% | 77.03% | 74.37% | 75.68% | ||
| X | X | 86.64% | 79.36% | 76.15% | 77.72% | ||
| X | X | X | 86.72% | 79.02% | 76.48% | 77.73% | |
| X | Combined | 86.96% | 78.83% | 79.23% | 79.02% | ||
| Combined | X | 87.73% | 80.55% | 79.62% | 80.08% | ||
| Backbone | Concat | Element-Wised Addition | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|---|
| ResNet-50 | X | 86.18% | 75.88% | 76.68% | 76.28% | |
| X | 87.73% | 80.55% | 79.62% | 80.08% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Saber, A.; Fateh, A.; Parhami, P.; Siahkarzadeh, A.; Fateh, M.; Ferdowsi, S. Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach. Sensors 2025, 25, 7233. https://doi.org/10.3390/s25237233
Saber A, Fateh A, Parhami P, Siahkarzadeh A, Fateh M, Ferdowsi S. Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach. Sensors. 2025; 25(23):7233. https://doi.org/10.3390/s25237233
Chicago/Turabian StyleSaber, Alireza, Amirreza Fateh, Pouria Parhami, Alimohammad Siahkarzadeh, Mansoor Fateh, and Saideh Ferdowsi. 2025. "Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach" Sensors 25, no. 23: 7233. https://doi.org/10.3390/s25237233
APA StyleSaber, A., Fateh, A., Parhami, P., Siahkarzadeh, A., Fateh, M., & Ferdowsi, S. (2025). Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach. Sensors, 25(23), 7233. https://doi.org/10.3390/s25237233

