A Lightweight Hybrid Deep Learning Model for Tuberculosis Detection from Chest X-Rays
Abstract
1. Introduction
- •
- Limited Dataset Size: The development of a robust TB detection model highly relies on the availability of sufficiently large and diverse datasets. However, some of the existing research utilized relatively small datasets, thus inhibiting the model from effectively generalizing to different populations and clinical settings.
- •
- Inadequate Handling of Complex Patterns: Many conventional methods rely on handcrafted feature extraction techniques, which struggle to learn and extract complex hierarchical patterns present in chest X-ray images. This restricts their ability to make fine distinctions between TB and non-TB conditions.
- •
- Single-Model Dependency: A significant portion of earlier research is based on a single deep-learning model, which may limit the model’s ability to capture diverse and complementary feature representations. Such reliance can negatively impact detection performance.
- •
- High Computational Cost: Although some cutting-edge techniques attained excellent accuracy, their high computational complexity can limit their real-time applicability and deployment viability in environments with limited resources.
- •
- Development of a Hybrid Deep Learning Model: A novel hybrid deep learning approach is introduced by combining GhostNet, a light-weight CNN, with MobileViT, a transformer-based model. The combination leverages the power of both local feature extraction (CNN) and global context modeling (transformers) for more accurate TB classification.
- •
- Feature Fusion Strategy: An effective feature-level fusion mechanism is developed, wherein feature maps pooled globally from both GhostNet and MobileViT are concatenated and passed through a joint classification head.
- •
- Balanced Trade-Off Between Efficiency and Performance: The introduced model attains high diagnostic accuracy (up to 99.52% accuracy) while preserving computational efficiency (just 7.73 million parameters and 282.11M Floating Point Operations (FLOPs), superior to most state-of-the-art models.
- •
- Thorough Assessment on Two Datasets: This study employs two independently sourced chest X-ray datasets—dataset 1, with a total of 7000 images (3500 TB and 3500 normal), and dataset 2, with 1600 images (800 TB and 800 normal). Each dataset was randomly divided into training (70%), validation (15%), and testing (15%) subsets. All performance metrics reported in this study are based exclusively on the held-out test subsets, ensuring an unbiased evaluation of the model’s generalization capability across different imaging distributions.
- •
- Five-fold cross-validation: A comprehensive 5-fold cross-validation evaluation was conducted to ensure statistical reliability and demonstrate consistent model generalization across different partitions of the dataset.
- •
- Mathematical Formalization of the Hybrid Network: A comprehensive mathematical modeling of the suggested hybrid architecture is presented.
2. Literature Review
2.1. Machine Learning
2.2. Deep Learning
3. Methodology
3.1. Data Collection
3.1.1. Dataset 1
- •
- The National Library of Medicine (NLM) Dataset [56], which contains images from two lung X-ray datasets: The Shenzhen, China, dataset (336 TB positive, 326 healthy, 3000 × 3000 pixels) and the Montgomery County dataset (58 TB positive, 80 healthy, 4020 × 4892 pixels).
- •
- Belarus Dataset [57]: Various organizations under the Republic of Belarus’s Ministry of Health put together the dataset. It includes the CXRs of 306 TB patients; each scan has a resolution of 2248 × 2248 pixels.
- •
- NIAID TB Portal Program Dataset [58]: A total of 2800 TB-positive CXR images from roughly 3087 cases are included in the NIAID TB portal program dataset and were gathered from seven different countries.
- •
3.1.2. Dataset 2
3.2. Data Preprocessing
- •
- Grayscaling: This refers to the conversion of RGB images to grayscale.
- •
- Image resizing: The grayscale image was rescaled to a particular size, that is 128 × 128 pixels.
- •
- Contrast Limited Adaptive Histogram Equalization (CLAHE): This improves the contrast of the image, making the tuberculosis area more pronounced and clearer.
- •
- Image normalization: Normalization adapts the intensity range of individual pixels, typically to make the pixel values of an image more consistent and uniform to the human eye.
3.3. Data Augmentation
3.4. Feature Extraction
3.4.1. GhostNet
3.4.2. MobileViT
3.4.3. Proposed Model
- •
- GhostNet Path: GhostNet uses a series of ghost modules to generate more feature maps with fewer computations. It concentrates on capturing spatial and texture-level patterns that are generally present in medical imaging data.
- •
- MobileViT Path: MobileViT leverages the inductive bias of CNNs combined with the long-range feature modeling capacity of transformers. This enables the network to efficiently capture fine-grained details as well as coarse contextual clues from the input images.
3.4.4. Mathematical Modeling of the Proposed Hybrid Network
- Input and Pre-processing
- •
- Resized to 128 128 and converted to grayscale: , where H = W = 128 (grayscale image).
- •
- Enhanced using CLAHE (Contrast Limited Adaptive Histogram Equalization): .
- •
- Normalized and converted to a 3-channel image for compatibility with pre-trained models [68].
- 2.
- GhostNet Branch
- 3.
- MobileViT Branch
- 4.
- Feature Fusion
- •
- denotes the concatenation operation;
- •
- = .
- 5.
- Classification Head
- Fully Connected Layer with ReLU:where is the weight matrix of the first fully connected layer and is the bias vector.
- Dropout for regularization:
- Final classification layer:where is the predicted probability distribution over the two classes (normal, and TB), is the weight matrix of the final output layer, and is the bias vector for the final layer.
- 6.
- Optimization Objective
- •
- The model is trained using the cross-entropy loss [69]:where is the ground-truth label and is the predicted probability from the softmax output.
- •
- The model is optimized using the Adam optimizer [70] with a learning rate of 0.001.where is the model parameters, is the gradient of the loss function with respect to the parameters , and is the learning rate.
- 7.
- The final pipeline of the suggested approach can be described as shown in Equation (17):
4. Results and Discussion
4.1. Performance Metrics
4.2. Hyperparameters
4.3. Models’ Evaluation and Selection for the Proposed Model
4.4. Performance Analysis of the Proposed Hybrid Model
4.5. Additional Validation via 5-Fold Cross-Validation
4.6. Hyperparameter Optimization
5. Conclusions and Future Works
- •
- Multi-Class Classification: Expanding the binary classification (TB vs. normal) to include several pulmonary diseases, such as pneumonia, fibrosis, or lung cancer. This will make the model more clinically useful.
- •
- Clinical Deployment and Real-world Validation: Efforts in the future should also involve testing the model in real clinical settings, its integration with radiology work-flows, and real-time operation on different imaging devices.
- •
- Cross-Modal Extension: Investigating hybrid architectures that combine chest X-rays with other modalities (e.g., CT scans or clinical reports) could boost diagnostic performance for difficult or ambiguous cases.
- •
- External Validation: To further assess the robustness of the proposed model and its clinical applicability, in future work, we will focus on validating the model using larger, multi-center datasets.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Parveen Rahamathulla, M.; Sam Emmanuel, W.R.; Bindhu, A.; Mustaq Ahmed, M. YOLOv8’s advancements in tuberculosis identification from chest images. Front. Big Data 2024, 7, 1401981–1401991. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Jiang, Z.; Liang, P.; Liu, Z.; Cai, H.; Sun, Q. TB-DROP: Deep learning-based drug resistance prediction of Mycobacterium tuberculosis utilizing whole genome mutations. BMC Genomics 2024, 25, 167–177. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization. Global Tuberculosis Report. 2020. Available online: https://www.who.int/publications/i/item/9789240013131 (accessed on 1 January 2025).
- Kotei, E.; Thirunavukarasu, R. A comprehensive review on advancement in deep learning techniques for automatic detection of tuberculosis from chest X-ray images. Arch. Comput. Methods Eng. 2024, 31, 455–474. [Google Scholar] [CrossRef]
- World Health Organization. Global Tuberculosis Report. 2021. Available online: https://www.who.int/publications/i/item/9789240037021 (accessed on 1 January 2025).
- Sharma, S.K.; Mohan, A. Tuberculosis: From an incurable scourge to a curable disease-journey over a millennium. Indian J. Med. Res. 2013, 137, 455–493. [Google Scholar] [PubMed]
- van’t Hoog, A.H.; Meme, H.K.; Laserson, K.F.; Agaya, J.A.; Muchiri, B.G.; Githui, W.A.; Odeny, L.O.; Marston, B.J.; Borgdorff, M.W. Screening strategies for tuberculosis prevalence surveys: The value of chest radiography and symptoms. PLoS ONE 2012, 7, 38691–38701. [Google Scholar] [CrossRef]
- Degnan, A.J.; Ghobadi, E.H.; Hardy, P.; Krupinski, E.; Scali, E.P.; Stratchko, L.; Ulano, A.; Walker, E.; Wasnik, A.P.; Auffermann, W.F. Perceptual and interpretive error in diagnostic radiology—Causes and potential solutions. Acad. Radiol. 2019, 26, 833–845. [Google Scholar] [CrossRef]
- Van Cleeff, M.R.A.; Kivihya-Ndugga, L.E.; Meme, H.; Odhiambo, J.A.; Klatser, P.R. The role and performance of chest X-ray for the diagnosis of tuberculosis: A cost-effectiveness analysis in Nairobi, Kenya. BMC Infect. Dis. 2005, 5, 111. [Google Scholar] [CrossRef]
- Gore, J.C. Artificial intelligence in medical imaging. Magn. Reson. Imaging 2020, 68, A1–A4. [Google Scholar] [CrossRef]
- Kaur, S.; Singla, J.; Nkenyereye, L.; Jha, S.; Prashar, D.; Joshi, G.P.; El-Sappagh, S.; Islam, M.S.; Riazul Islam, S.M. Medical diagnostic systems using artificial intelligence (AI) algorithms: Principles and perspectives. IEEE Access 2020, 8, 228049–228069. [Google Scholar] [CrossRef]
- Vattikuti, M.C. A Comprehensive Review of AI-Based Diagnostic Tools for Early Disease Detection in Healthcare. Res. J. 2020, 6, 1–10. [Google Scholar]
- Waheed, Z.; Gui, J.; Amjad, K.; Waheed, I.; Asif, S. An ensemble approach of deep CNN models with Beta normalization aggregation for gastrointestinal disease detection. Biomed. Signal Process. Control 2025, 105, 107567–107577. [Google Scholar] [CrossRef]
- Abumihsan, A.; Owda, A.Y.; Owda, M.; Abumohsen, M.; Stergioulas, L.; Abu Amer, M.A.A. A Novel Hybrid Model for Brain Ischemic Stroke Detection Using Feature Fusion and Convolutional Block Attention Module. IEEE Access 2025, 13, 44466–44483. [Google Scholar] [CrossRef]
- Puttagunta, M.K.; Ravi, S. Detection of Tuberculosis based on Deep Learning based methods. J. Phys. Conf. Ser. 2021, 1767, 12004–12009. [Google Scholar] [CrossRef]
- Devnath, L.; Luo, S.; Summons, P.; Wang, D. Automated detection of pneumoconiosis with multilevel deep features learned from chest X-Ray radiographs. Comput. Biol. Med. 2021, 129, 104125–104135. [Google Scholar] [CrossRef]
- Stirenko, S.; Kochura, Y.; Alienin, O.; Rokovyi, O.; Gang, P.; Zeng, W.; Gordienko, Y. Chest X-ray analysis of tuberculosis by deep learning with segmentation and augmentation. In Proceedings of the 2018 IEEE 38th International Conference on Electronics and Nanotechnology (ELNANO), Kyiv, Ukraine, 24–26 April 2018; pp. 422–428. [Google Scholar] [CrossRef]
- Pasa, F.; Golkov, V.; Pfeiffer, F.; Cremers, D.; Pfeiffer, D. Efficient deep network architectures for fast chest X-ray tuberculosis screening and visualization. Sci. Rep. 2019, 9, 6268–6278. [Google Scholar] [CrossRef]
- Abideen, Z.U.; Ghafoor, M.; Munir, K.; Saqib, M.; Ullah, A.; Zia, T.; Tariq, S.A.; Ahmed, G.; Zahra, A. Uncertainty assisted robust tuberculosis identification with bayesian convolutional neural networks. Ieee Access 2020, 8, 22812–22825. [Google Scholar] [CrossRef] [PubMed]
- Xie, Y.; Wu, Z.; Han, X.; Wang, H.; Wu, Y.; Cui, L.; Feng, J.; Zhu, Z.; Chen, Z. Computer-Aided System for the Detection of Multicategory Pulmonary Tuberculosis in Radiographs. J. Healthc. Eng. 2020, 2020, 9205082–9205095. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, M.S.; Rahman, A.; AlGhamdi, F.; AlDakheel, S.; Hakami, H.; AlJumah, A.; AlIbrahim, Z.; Youldash, M.; Alam Khan, M.A.; Basheer Ahmed, M.I. Joint diagnosis of pneumonia, COVID-19, and tuberculosis from chest X-ray images: A deep learning approach. Diagnostics 2023, 13, 2562. [Google Scholar] [CrossRef] [PubMed]
- Xu, F.; Mei, S.; Zhang, G.; Wang, N.; Du, Q. Bridging cnn and transformer with cross attention fusion network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
- Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
- Gaudêncio, A.S.; Carvalho, M.; Vaz, P.G.; Cardoso, J.M.; Humeau-Heurtier, A. Tuberculosis detection on chest X-rays using two-dimensional multiscale symbolic dynamic entropy. Biomed. Signal Process. Control 2026, 111, 108346. [Google Scholar] [CrossRef]
- Ulutas, H.; Sahin, M.E.; Karakus, M.O. Application of a novel deep learning technique using CT images for COVID-19 diagnosis on embedded systems. Alex. Eng. J. 2023, 74, 345–358. [Google Scholar] [CrossRef]
- Jiang, X.; Lu, S.-Y.; Zhang, Y.-D. SAM-LCA: A computationally efficient SAM-based model for tuberculosis detection in chest X-rays. Multimed. Syst. 2025, 31, 204. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Shamshad, F.; Khan, S.; Zamir, S.W.; Khan, M.H.; Hayat, M.; Shahbaz Khan, F.; Fu, H. Transformers in medical imaging: A survey. Med. Image Anal. 2023, 88, 102802–102822. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Maurício, J.; Domingues, I.; Bernardino, J. Comparing vision transformers and convolutional neural networks for image classification: A literature review. Appl. Sci. 2023, 13, 5521. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar] [CrossRef]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar] [CrossRef]
- Ghanshala, T.; Tripathi, V.; Pant, B. An effective vision based framework for the identification of tuberculosis in chest X-Ray images. In Proceedings of the 4th International Conference on Advances in Computing and Data Sciences (ICACDS 2020), Valletta, Malta, 24–25 April 2020; Revised Selected Papers 4. pp. 36–45. [Google Scholar] [CrossRef]
- Chandra, T.B.; Verma, K.; Singh, B.K.; Jain, D.; Netam, S.S. Automatic detection of tuberculosis related abnormalities in Chest X-ray images using hierarchical feature extraction scheme. Expert Syst. Appl. 2020, 158, 113514–113527. [Google Scholar] [CrossRef]
- Jaeger, S.; Karargyris, A.; Candemir, S.; Folio, L.; Siegelman, J.; Callaghan, F.; Xue, Z.; Palaniappan, K.; Singh, R.K.; Antani, S.; et al. Automatic tuberculosis screening using chest radiographs. IEEE Trans. Med. Imaging 2013, 33, 233–245. [Google Scholar] [CrossRef]
- Muhathir, M.; Sibarani, T.T.S.; Al-Khowarizmi, A. Analysis K-Nearest Neighbors (KNN) in Identifying Tuberculosis Disease (Tb) By Utilizing Hog Feature Extraction. Al’adzkiya Int. Comput. Sci. Inf. Technol. J. 2020, 1, 75–80. [Google Scholar] [CrossRef]
- Wang, Z.; Li, T. A lightweight CNN model based on GhostNet. Comput. Intell. Neurosci. 2022, 2022, 8396550. [Google Scholar] [CrossRef]
- Wadekar, S.N.; Chaurasia, A. Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features. arXiv 2022, arXiv:2209.15159. [Google Scholar]
- Rahman, T.; Khandakar, A.; Kadir, M.A.; Islam, K.R.; Islam, K.F.; Mazhar, R.; Hamid, T.; Islam, M.T.; Kashem, S.; Mahbub, Z.B.; et al. Reliable tuberculosis detection using chest X-ray with deep learning, segmentation and visualization. IEEE Access 2020, 8, 191586–191601. [Google Scholar] [CrossRef]
- Shome, N.; Kashyap, R.; Laskar, R.H. Detection of tuberculosis using customized MobileNet and transfer learning from chest X-ray image. Image Vis. Comput. 2024, 147, 105063–105075. [Google Scholar] [CrossRef]
- Bhuria, R.; Gupta, S. Automated Tuberculosis Detection from Chest X-Rays: A Deep Learning Approach with DenseNet121. In Proceedings of the 2024 Global Conference on Communications and Information Technologies (GCCIT), Bangalore, India, 25–26 October 2024; pp. 1–5. [Google Scholar] [CrossRef]
- Karnkawinpong, T.; Limpiyakorn, Y. Chest X-ray analysis of tuberculosis by convolutional neural networks with affine transforms. In Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence, Shenzhen, China, 8–10 December 2018; pp. 90–93. [Google Scholar] [CrossRef]
- Ahsan, M.; Gomes, R.; Denton, A. Application of a convolutional neural network using transfer learning for tuberculosis detection. In Proceedings of the 2019 IEEE International Conference on Electro Information Technology (EIT), Brookings, SD, USA, 20–22 May 2019; pp. 427–433. [Google Scholar] [CrossRef]
- Huy, V.T.Q.; Lin, C.-M. An improved densenet deep neural network model for tuberculosis detection using chest x-ray images. IEEE Access 2023, 11, 42839–42849. [Google Scholar] [CrossRef]
- Ejiyi, C.J.; Qin, Z.; Nnani, A.O.; Deng, F.; Ejiyi, T.U.; Ejiyi, M.B.; Agbesi, V.K.; Bamisile, O. ResfEANet: ResNet-fused external attention network for tuberculosis diagnosis using chest X-ray images. Comput. Methods Programs Biomed. Update 2024, 5, 100133–100147. [Google Scholar] [CrossRef]
- Munadi, K.; Muchtar, K.; Maulina, N.; Pradhan, B. Image enhancement for tuberculosis detection using deep learning. IEEE Access 2020, 8, 217897–217907. [Google Scholar] [CrossRef]
- Sivaramakrishnan, R.; Antani, S.; Candemir, S.; Xue, Z.; Abuya, J.; Kohli, M.; Alderson, P.; Thoma, G. Comparing deep learning models for population screening using chest radiography. In Proceedings of the Medical Imaging 2018: Computer-Aided Diagnosis, Houston, TX, USA, 12–15 February 2018; pp. 322–332. [Google Scholar] [CrossRef]
- Sahlol, A.T.; Abd Elaziz, M.; Tariq Jamal, A.; Damaševičius, R.; Farouk Hassan, O. A novel method for detection of tuberculosis in chest radiographs using artificial ecosystem-based optimisation of deep neural network features. Symmetry 2020, 12, 1146. [Google Scholar] [CrossRef]
- Shekar, B.H.; Mannan, S. Differential Diagnosis of Pulmonary Diseases using Convolutional Neural Network with LSTM Architecture. Res. Sq. 2024, 1–15. [Google Scholar] [CrossRef]
- Goyal, M.K.; Yadav, D.K.; Brar, K.K.; Mittal, M.; Ojha, A.; Alzubaidi, L.H.; Pathani, A. TB Chest X-Ray Diagnostic Technique Using Deep Learning. In Proceedings of the 2024 3rd Edition of IEEE Delhi Section Flagship Conference (DELCON), New Delhi, India, 21–23 November 2024; pp. 1–4. [Google Scholar] [CrossRef]
- Rehman, A.U.; Bajwa, T.H.; Bajwa, U.; Toor, W.T. Tuberculosis Detection in Chest X-Rays Using Hybrid Deep Learning Models. In Proceedings of the 2024 3rd International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE), Lahore, Pakistan, 26–27 November 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Hooda, R.; Mittal, A.; Sofat, S. Automated TB classification using ensemble of deep architectures. Multimed. Tools Appl. 2019, 78, 31515–31532. [Google Scholar] [CrossRef]
- Khatibi, T.; Shahsavari, A.; Farahani, A. Proposing a novel multi-instance learning model for tuberculosis recognition from chest X-ray images based on CNNs, complex networks and stacked ensemble. Phys. Eng. Sci. Med. 2021, 44, 291–311. [Google Scholar] [CrossRef] [PubMed]
- Rajaraman, S.; Cemir, S.; Xue, Z.; Alderson, P.O.; Thoma, G.R.; Antani, S. A novel stacked model ensemble for improved TB detection in chest radiographs. In Medical Imaging; CRC Press: Boca Raton, FL, USA, 2019; pp. 1–26. [Google Scholar]
- Liu, M.-M.C.; Wu, Y.-H.; Ban, Y.; Wang, H. TBX11K Dataset. Available online: https://www.kaggle.com/datasets/usmanshams/tbx-11/data (accessed on 1 January 2025).
- Abbas, A.; Abdelsamea, M.M.; Gaber, M.M. Detrac: Transfer learning of class decomposed medical images in convolutional neural networks. IEEE Access 2020, 8, 74901–74913. [Google Scholar] [CrossRef]
- Rahman, T.; Khandakar, A.; Chowdhury, M. Tuberculosis (TB) Chest X-Ray Database, IEEE Dataport, 20 October 2020. Available online: https://ieee-dataport.org/documents/tuberculosis-tb-chest-x-ray-database (accessed on 1 January 2025). [CrossRef]
- Rosenthal, A.; Gabrielian, A.; Engle, E.; Hurt, D.E.; Alexandru, S.; Crudu, V.; Sergueev, E.; Kirichenko, V.; Lapitskii, V.; Snezhko, E.; et al. The TB portals: An open-access, web-based platform for global drug-resistant-tuberculosis data sharing and analysis. J. Clin. Microbiol. 2017, 55, 3267–3282. [Google Scholar] [CrossRef]
- Radiological Society of North America. RSNA PneumoniaDetection Challenge. Available online: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data (accessed on 1 January 2025).
- Saadah, H.; Owda, A.Y.; Owda, M. Convolutional neural networks breast cancer classification using Palestinian mammogram dataset. Indones. J. Electr. Eng. Comput. Sci. 2024, 36, 1149–1162. [Google Scholar] [CrossRef]
- Abumohsen, M.; Costa-Montenegro, E.; García-Méndez, S.; Owda, A.Y.; Owda, M. Advanced Deep Learning Techniques for Accurate Lung Cancer Detection and Classification. In Proceedings of the 2025 12th International Conference on Information Technology (ICIT), Amman, Jordan, 27–30 May 2025; pp. 7–12. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November2019; 2019; pp. 1314–1324. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar] [CrossRef]
- Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-attention with relative position representations. arXiv 2018, arXiv:1803.02155. [Google Scholar] [CrossRef]
- Li, J.; Tu, Z.; Yang, B.; Lyu, M.R.; Zhang, T. Multi-head attention with disagreement regularization. arXiv 2018, arXiv:1810.10183. [Google Scholar] [CrossRef]
- Lalande, A.; Chen, Z.; Pommier, T.; Decourselle, T.; Qayyum, A.; Salomon, M.; Ginhac, D.; Skandarani, Y.; Boucher, A.; Brahim, K.; et al. Deep learning methods for automatic evaluation of delayed enhancement-MRI. The results of the EMIDEC challenge. Med. Image Anal. 2022, 79, 102428–102441. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Volume 1. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Owess, M.M.; Owda, A.Y.; Owda, M.; Massad, S. Supervised Machine Learning-Based Models for Predicting Raised Blood Sugar. Int. J. Environ. Res. Public Health 2024, 21, 840. [Google Scholar] [CrossRef]
- Eisentraut, L.; Mai, C.; Hosch, J.; Benecke, A.; Penava, P.; Buettner, R. Deep Learning Based Detection of Tuberculosis Using a Gaussian Chest X-Ray Image Filter as a Software Lens. IEEE Access 2025, 1–23. [Google Scholar] [CrossRef]
- Abumihsan, A.; Owda, M.; Owda, A.Y.; Gasir, F.; Abumohsen, M.; Stergioulas, L. A Novel Deep Learning Approach for Enhanced Ischemic Brain Stroke Detection from CT Images Using Deep Feature Extraction and Optimized Feature Selection. In Proceedings of the 2025 12th International Conference on Information Technology (ICIT), Amman, Jordan, 27–30 May 2025; pp. 1–6. [Google Scholar] [CrossRef]
- Raza, R.; Zulfiqar, F.; Khan, M.O.; Arif, M.; Alvi, A.; Iftikhar, M.A.; Alam, T. Lung-EffNet: Lung cancer classification using EfficientNet from CT-scan images. Eng. Appl. Artif. Intell. 2023, 126, 106902–106921. [Google Scholar] [CrossRef]
- Babu Vimala, B.; Srinivasan, S.; Mathivanan, S.K.; Mahalakshmi, M.; Jayagopal, P.; Dalu, G.T. Detection and classification of brain tumor using hybrid deep learning models. Sci. Rep. 2023, 13, 23029–23042. [Google Scholar] [CrossRef] [PubMed]
- Ali, M.M.; Maqsood, F.; Liu, S.; Hou, W.; Zhang, L.; Wang, Z. Enhancing Breast Cancer Diagnosis with Channel-Wise Attention Mechanisms in Deep Learning. Comput. Mater. Contin. 2023, 77, 1–19. [Google Scholar] [CrossRef]
- Choudhary, T.; Gujar, S.; Goswami, A.; Mishra, V.; Badal, T. Deep learning-based important weights-only transfer learning approach for COVID-19 CT-scan classification. Appl. Intell. 2023, 53, 7201–7215. [Google Scholar] [CrossRef]
- Lyken17. THOP: PyTorch-OpCounter. Available online: https://github.com/Lyken17/pytorch-OpCounter (accessed on 1 December 2025).
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 1–12. [Google Scholar]
- Chowdary, G.J.; Reddy, S.K.; Srinivas, K.; Rani, P.; Reddy, V.R.; Rao, P.V.R. Class dependency based learning using Bi-LSTM coupled with the transfer learning of VGG16 for the diagnosis of Tuberculosis from chest X-Rays. arXiv 2021, arXiv:2108.04329. [Google Scholar] [CrossRef]



















| References | Dataset | Methods | Results | Limitations |
|---|---|---|---|---|
| [33] | Montgomery, Shenzhen, and JSRT | Inception V-3, SVM, KNN, RF, and NN | NN (average accuracy: 80.45%; F1-score: 81.1%; precision: 81.1%; recall: 81.1%; and AUC: 0.894) | The used dataset is small (138 + 662 + 247). The proposed model’s performance is very low. |
| [34] | Montgomery, and Shenzhen, | HGF, FOSF, SVM | Montgomery dataset: Accuracy = 95.60%; AUC = 0.95. Shenzhen dataset: Accuracy = 99.40%; AUC = 0.99. | Very limited dataset size (138 + 662). The study used handcrafted shape and texture features that cannot learn complex hierarchical patterns. |
| [35] | Montgomery, and Shenzhen | Several handcrafted feature extraction methods, SVM | Montgomery dataset: AUC = 87%; Accuracy = 78.3%. Shenzhen dataset: AUC = 90%; Accuracy = 84%. | Very limited dataset size (138 + 662). The study used handcrafted feature extraction methods, which cannot learn complex hierarchical patterns. The proposed model’s performance is very low. |
| [36] | Shenzhen | HOG KNN | Accuracy: 71.81% | The used dataset is very small (662). The performance of the suggested model is really poor. The research employed handcrafted feature extraction techniques that are incapable of learning intricate hierarchical patterns. |
| [17] | Shenzhen | Deep CNN | The lossless augmentation method attained an accuracy of 70%, while the Lossy augmentation method attained an accuracy of 64%. | The used dataset is very small (662). The suggested model performance is very low. The study employed a single deep CNN model, which may limit its capacity to extract and focus on the most significant features. |
| [18] | Montgomery, Shenzhen, Belarus | CNN | Accuracy: 86.2%; AUC: 0.925 | Performance is very low. The study used a single model. Very limited dataset size (138 + 662 + 306). |
| [19] | Montgomery, Shenzhen | B-CNN | Montgomery (Accuracy = 96.42%) Shenzhen (Accuracy = 86.46%) | Very limited dataset size (138 + 662). Performance is relatively low. The study employed a single model. |
| [20] | Montgomery, Shenzhen, and private dataset (FAHXJU) | Faster RCNN with reinforcement learning | Montgomery dataset: Accuracy = 0.926% Shenzhen dataset: Accuracy = 0.902% | The suggested model performance is relatively low. |
| [21] | Dhaka-Qatar | CNN | For all classes of diseases (accuracy: 98.72%). For TB detection (precision: 98.9%; recall: 98.1%; F1-Score: 98.5%) | The study relied on a single model, which may lack the capacity to focus on the most crucial features. Did not address the dataset imbalance issue. |
| [39] | NLM, Belarus, NIAID, and RSNA. | ResNet18, ResNet50, ResNet101, ChexNet, InceptionV3, Vgg19, DenseNet201, SqueezeNet, and MobileNet | DenseNet201(accuracy: 98.6%; precision: 98.57%; sensitivity: 98.56%; F1-score: 98.56; and specificity: 98.54%) | High computational cost. Single-model dependency. |
| [40] | NIH, Shenzhen, and Montgomery | MobileNet | Overall accuracy: 98.66% For normal class (recall: 98.66%; precision: 99.41%; F1-score: 0.986; specificity: 99.41%). For infected class (recall: 99.42%; precision: 97.93%; F1-score: 0.986; specificity: 97.93%) | Single-model dependency. |
| [41] | NIAID, and Dhaka-Qatar | DenseNet121 | Accuracy: 90%; precision: 92%; recall: 90%; F1-score: 89%; AUC score: 0.976 | Performance is relatively low. High computational cost. The study relied on a single model. |
| [42] | Montgomery, Shenzhen, and a private dataset (Thailand) | AlexNet, VGG-16, CapsNet | CapsNet (accuracy: 80.06%; sensitivity: 92.72%) | Low performance. Single-model dependency |
| [43] | Montgomery, and Shenzhen, | VGG16 | Accuracy without augmentation: 80%. Accuracy with data augmentation: 81.25%. | Very limited dataset size (138 + 662). Very low performance. The study relied on a single model. |
| [44] | Montgomery, Shenzhen, and Dhaka-Qatar | CBAM, WDnet | Accuracy: 98.80%; sensitivity: 94.28%; precision: 98.50%. Specificity: 95.7%, F1-score: 96.35%. | The model is computationally intensive. |
| [45] | Montgomery, and Shenzhen | ResNet with a simple external attention mechanism | Accuracy: 97.59%; sensitivity: 100%; AUC: 97.8%; specificity: 95.56%; precision: 95%. | The used dataset is very small (138 + 662). The proposed model is a shallow network that could struggle to discern the complex and hierarchical patterns in very large or heterogeneous datasets. |
| [46] | Shenzhen | ResNet and EfficientNet | Accuracy: 89.92%; AUC: 94.8%. | The used dataset is very small (662). The suggested model performance is relatively low |
| [47] | Montgomery, Shenzhen, Kenya, and India | AlexNet, VGG-16, VGG-19, Xception, ResNet-50 | Shenzhen: Best accuracy 85.5% (VGG-16). Montgomery: Best accuracy 75.8% (Xception). Kenya: Best accuracy 69.5% (AlexNet, VGG-16). India: Best accuracy 87.6% (VGG-16). | Single-model dependency: Although multiple models were evaluated, the study ultimately relied on a single model for classification, which may restrict the system’s ability to capture a broader range of crucial features. The suggested model performance is relatively low |
| [48] | Shenzhen | MobileNet, AEO | Accuracy = 90.2% | The utilized dataset is very small (662). The suggested model performance is relatively low. |
| [49] | Montgomery, Shenzhen, and Belarus | CNN, LSTM | Accuracy: 96.26%; precision: 96.44%; recall: 96.62%; and F1-score: 96.49%. | Very limited dataset size (138 + 662 + 306). The study did not address the dataset imbalance issue. |
| [50] | Montgomery and Shenzhen | ResNet–SVM | Accuracy: 93.91%; precision: 93%; and AUC: 91%. | Very small dataset (138 + 662). The suggested model performance is relatively low. |
| [51] | NIAID, NLM, and Belarus | CNN, RNN, ANN | Accuracy: 97%; precision: 85%; recall: 90%; F1-score: 88%. | The precision, recall, and F1-score metrics are low. The suggested hybrid model, integrating CNNs, ANN, and RNN, increases computational complexity. |
| [52] | Belarus, Montgomery, Shenzhen, and JSRT | Ensemble technique (AlexNet, GoogleNet, and ResNet) | Accuracy: 88.24%; sensitivity: 88.42%; specificity: 88%; AUC: 0.93%. | The performance of the suggested model is low. High computational cost. |
| [53] | Montgomery, and Shenzhen | Xception, DenseNet, and Stacked ensemble classifier (LR, DT, RF, SVM, and AdaBoost) | Shenzhen (Specificity: 99.47%, Sensitivity: 99.39%, AUC: 0.98, and Accuracy: 99.22%). Montgomery (Specificity: 99.15%, Sensitivity: 99.42%, AUC: 0.99, and Accuracy: 99.26%) | Very small dataset (138 + 662). High computational cost. |
| [54] | Montgomery, Shenzhen, Kenya, and India | Several handcrafted feature extraction methods (HOG, GIST, and SURF), several pre-trained CNN algorithms (AlexNet, VGG-16, GoogLeNet, ResNet-50), and SVM. | Shenzhen (Accuracy: 93.4%), Montgomery (Accuracy: 87.5%), Kenya (Accuracy: 77.6%), and India (Accuracy: 96.0%). | The performance of the suggested model is relatively low. High computational cost. |
| Dataset | Dataset 1 | Dataset 2 |
|---|---|---|
| Source | CXR database [39] | TBX11K dataset [55] |
| Total Images | 7000 | 1600 |
| TB Cases | 3500 | 800 |
| Normal Cases | 3500 | 800 |
| Training (70%) | 4900 | 1120 |
| Validation (15%) | 1050 | 240 |
| Testing (15%) | 1050 | 240 |
| Hyperparameter | Value |
|---|---|
| Learning Rate | 0.001 |
| Optimizer | Adam |
| Loss Function | Cross-Entropy Loss |
| Batch Size | 32 |
| Number of Epochs | 10 |
| Dropout Rate | 0.5 |
| Model | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | Parameters (M) | FLOPs (M) | |
|---|---|---|---|---|---|---|---|
| Dataset 1 | GhostNet | 95.90 | 96.70 | 95.04 | 95.86 | 3.9 | 52.32 |
| MobileViT | 98.48 | 98.27 | 98.65 | 98.46 | 1.94 | 227.91 | |
| VGG-16 | 96 | 96.56 | 95.46 | 96.01 | 52.48 | 5066.59 | |
| DenseNet-121 | 96.29 | 95.28 | 97.3 | 96.28 | 6.96 | 945.63 | |
| MobileNetV3 | 97.62 | 96.3 | 98.27 | 97.77 | 4.2 | 78.11 | |
| ResNet50 | 98.29 | 98.11 | 98.48 | 98.3 | 23.51 | 1349.13 | |
| EfficientNet-B7 | 99.14 | 99.56 | 98.24 | 99.11 | 63.79 | 5344.54 | |
| Dataset 2 | GhostNet | 94.98 | 95.65 | 94.02 | 94.78 | 3.9 | 52.32 |
| MobileViT | 98.75 | 98.35 | 99.17 | 98.76 | 1.94 | 227.91 | |
| VGG-16 | 97.08 | 98.15 | 96.12 | 97.1 | 52.48 | 5066.59 | |
| DenseNet-121 | 95.83 | 95.43 | 97.22 | 96.24 | 6.96 | 945.63 | |
| MobileNetV3 | 98.33 | 98.2 | 98.64 | 98.41 | 4.2 | 78.11 | |
| ResNet50 | 96.67 | 96.16 | 97.33 | 96.72 | 23.51 | 1349.13 | |
| EfficientNet-B7 | 98.75 | 98.32 | 99.15 | 98.73 | 63.79 | 5344.54 |
| Model | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | Parameters (M) | FLOPs (M) | Inference Time (MS) | |
|---|---|---|---|---|---|---|---|---|
| Dataset 1 | GhostNet | 95.90 | 96.70 | 95.04 | 95.86 | 3.9 | 52.32 | 33.01 |
| MobileViT | 98.48 | 98.27 | 98.65 | 98.46 | 1.94 | 227.91 | 27.24 | |
| Proposed Model | 99.52 | 99.61 | 99.42 | 99.51 | 7.73 | 282.11 | 65.28 | |
| Dataset 2 | GhostNet | 94.98 | 95.65 | 94.02 | 94.78 | 3.9 | 52.32 | 34.68 |
| MobileViT | 98.75 | 98.35 | 99.17 | 98.76 | 1.94 | 227.91 | 29.53 | |
| Proposed Model | 99.17 | 99.2 | 99.2 | 99.2 | 7.73 | 282.11 | 66.96 |
| Metric | GhostNet | MobileViT | Sum (Expected) | Hybrid (Actual) |
|---|---|---|---|---|
| Parameters (M) | 3.90 | 1.94 | 5.84 | 7.73 |
| FLOPs (M) | 52.32 | 227.91 | 280.23 | 282.11 |
| Fold | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
|---|---|---|---|---|
| 1 | 99.57 | 99.71 | 99.43 | 99.57 |
| 2 | 98.79 | 98.03 | 99.57 | 98.8 |
| 3 | 99.43 | 99.43 | 99.43 | 99.43 |
| 4 | 99.93 | 99.88 | 99.86 | 99.93 |
| 5 | 99.553 | 99.62 | 99.42 | 99.52 |
| Mean ± SD | 99.45 ± 0.37 | 99.33 ± 0.67 | 99.54 ± 0.17 | 99.45 ± 0.37 |
| 95% CI: | [98.99, 99.92] | [98.50, 100.16] | [99.33, 99.75] | [98.99, 99.91] |
| Fold | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
|---|---|---|---|---|
| 1 | 99.38 | 98.77 | 99.98 | 99.38 |
| 2 | 98.75 | 98.93 | 97.5 | 98.73 |
| 3 | 99.15 | 99.17 | 99.29 | 99.16 |
| 4 | 99.89 | 99.91 | 99.93 | 99.88 |
| 5 | 98.56 | 98.73 | 99.17 | 98.56 |
| Mean ± SD | 99.15 ± 0.47 | 99.10 ± 0.43 | 99.17 ± 0.90 | 99.14 ± 0.47 |
| 95% CI: | [98.56, 99.73] | [98.56, 99.64] | [98.06, 100.29] | [98.56, 99.73] |
| Learning Rate | Batch Size | Accuracy (%) | |
|---|---|---|---|
| Dataset 1 | 16 | 0.0001 | 98.86 |
| 0.0005 | 97.43 | ||
| 0.001 | 98.95 | ||
| 32 | 0.0001 | 98.57 | |
| 0.0005 | 99.24 | ||
| 0.001 | 99.53 | ||
| 64 | 0.0001 | 99.14 | |
| 0.0005 | 98.38 | ||
| 0.001 | 98.29 | ||
| Dataset 2 | 16 | 0.0001 | 97.92 |
| 0.0005 | 97.08 | ||
| 0.001 | 99.01 | ||
| 32 | 0.0001 | 99.12 | |
| 0.0005 | 99.19 | ||
| 0.001 | 99.19 | ||
| 64 | 0.0001 | 97.92 | |
| 0.0005 | 98.75 | ||
| 0.001 | 97.5 |
| Reference | Dataset | Methods | Results |
|---|---|---|---|
| [18] | Montgomery, Shenzhen, Belarus | CNN | Accuracy: 86.2% |
| [19] | Montgomery, Shenzhen | B-CNN | Montgomery (Accuracy = 96.42%). Shenzhen (Accuracy = 86.46%) |
| [21] | Dhaka-Qatar | CNN | Accuracy: 98.72% Precision: 98.9% Recall: 98.1% F1-score: 98.5% |
| [39] | NLM, Belarus, NIAID, and RSNA. | DenseNet201 | Accuracy:98.6% precision: 98.57% F1-score: 98.56 |
| [40] | NIH, Shenzhen, Montgomery | MobileNet | Overall accuracy: 98.66% |
| [41] | NIAID, Dhaka-Qatar | DenseNet121 | Accuracy: 90% Precision: 92% Recall: 90% F1-score: 89% |
| [44] | Montgomery, Shenzhen, and Dhaka-Qatar | CBAM, WDnet | Accuracy: 98.80% Precision: 98.50% F1-score: 96.35% |
| [49] | Montgomery, Shenzhen, and Belarus | CNN, LSTM | Accuracy: 96.26% precision: 96.44% Recall: 96.62% F1-score: 96.49% |
| [80] | Montgomery, Shenzhen | VGG16, Bi-LSTM | Shenzhen (Accuracy: 97.76%). Montgomery (accuracy: 96.42%) |
| [53] | Montgomery, Shenzhen | Xception, DenseNet, and Stacked ensemble classifier (LR, DT, RF, SVM, and AdaBoost) | Shenzhen (Accuracy: 99.22%). Montgomery (Accuracy: 99.26%) |
| [72] | CXR dataset (dataset 1) | ResNet50 | Accuracy: 99.2% F1-score: 99.18% |
| This study | Dataset 1 | Hybrid model (GhostNet and MobileViT) | Accuracy: 99.52% Precision: 99.61% Recall: 99.42% F1-score: 99.51% Params: 7.73 FLOPs: 282.11 |
| This study | Dataset 2 | Hybrid model (GhostNet and MobileViT) | Accuracy: 99.17% Precision: 99.2% Recall: 99.2% F1-score: 99.2% Params: 7.73 FLOPs: 282.11 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Owda, M.; Abumihsan, A.; Owda, A.Y.; Abumohsen, M. A Lightweight Hybrid Deep Learning Model for Tuberculosis Detection from Chest X-Rays. Diagnostics 2025, 15, 3216. https://doi.org/10.3390/diagnostics15243216
Owda M, Abumihsan A, Owda AY, Abumohsen M. A Lightweight Hybrid Deep Learning Model for Tuberculosis Detection from Chest X-Rays. Diagnostics. 2025; 15(24):3216. https://doi.org/10.3390/diagnostics15243216
Chicago/Turabian StyleOwda, Majdi, Ahmad Abumihsan, Amani Yousef Owda, and Mobarak Abumohsen. 2025. "A Lightweight Hybrid Deep Learning Model for Tuberculosis Detection from Chest X-Rays" Diagnostics 15, no. 24: 3216. https://doi.org/10.3390/diagnostics15243216
APA StyleOwda, M., Abumihsan, A., Owda, A. Y., & Abumohsen, M. (2025). A Lightweight Hybrid Deep Learning Model for Tuberculosis Detection from Chest X-Rays. Diagnostics, 15(24), 3216. https://doi.org/10.3390/diagnostics15243216

