1. Introduction
The COVID-19 pandemic has emerged as a critical global health challenge that has caused an urgent global health crisis. Designated by the World Health Organization (WHO) [
1], this virus, formally known as Coronavirus, represents a highly contagious and perilous respiratory infection. Originating in Wuhan, China in November 2019, COVID-19 rapidly spread across the globe, earning its status as a pandemic. Its rapid transmission elevated it to the forefront of global medical concerns.
Early diagnosis of COVID-19 plays a critical factor in controlling its spread and mitigating its impact on public health, especially in regions with a rapid increase in confirmed cases. In this critical context, medical imaging techniques, particularly the analysis of Chest X-ray images, have demonstrated potential as an invaluable tool for the disease diagnostic process [
2,
3]. Recently, Deep Learning (DL) techniques [
4] have led to remarkable advances in medical data analysis, including medical images, wherein DL techniques provide huge potential for extracting tiny features in image analysis [
5], particularly in the context of COVID-19 [
4].
For this study, we collected data from two different public resources: the first was the Extensive and Augmented COVID-19 X-ray and CT Chest Images Dataset [
6], and the second was the Kaggle site, which was originally collected from [
7]. The collected datasets each consisted of 15,000 X-ray images. In this research, we divided the work into two main tasks: the first task was the binary classification task, which was related to classifying the given X-ray images into Normal or Abnormal, while the second task was a multi-class classification task, where the goal was to classify the X-ray images into one of three classes: Normal, COVID-19, or Pneumonia.
This work included several new non-trivial extensions to the preliminary version from our research in [
8]:
In this work, we used larger datasets. We increased the number of images in this work to 15,000 X-ray images, while in our previous work [
8] we had a smaller dataset, which was 7800 images. Also, in this work we had balanced datasets, which meant that the number of samples in each class was equal, while in our previous work the dataset was imbalanced. In [
8], the employed dataset was smaller and imbalanced, leading to inflated performance metrics. By using balanced and larger datasets in this study, the models faced a more challenging classification task, which resulted in more realistic and reliable performance metrics.
In this research, we used data pre-processing techniques that differed from the previously used techniques in [
8]. We deployed more data-augmentation techniques, such as image color conversion (from gray-scale image to RGB image) using the OpenCV [
9] Python 3.10 Google Colab version. library. In addition, in this research, we used different types of data augmentation to increase the number of images that were used to train the proposed models to detect COVID-19 more accurately. The total number of images after data augmentation was 553,500 images.
We also utilized deep learning architectures that differed from the previously used architectures. In the previous work, we limited the used approaches to Inception-V3, Xception, and MobileNet, focusing only on the binary task, while in this work we utilized six distinct CNN models (Xception, Inception-V3, ResNet50, VGG19, DenseNet201, and InceptionResNet-V2) for both binary (Normal vs. Abnormal) and multi-class (Normal, COVID-19, Pneumonia) classification tasks.
The utilized models are promising, as they achieved highly competitive results, compared to the previous research, of 98.13% accuracy, 98.14% precision, 97.65% recall, and a 97.89% F1-score in binary classification, while in multi-classification they yielded 87.73% accuracy, 90.20% precision, 87.73% recall, and an 87.49% F1-score. Compared with the recently published work, the obtained results outperformed the others clearly in multiclass classification, especially for the ResNet50 and Xception models.
In addition to the foregoing technical contributions, we have conducted a literature review, and we discuss many related research efforts in this area.
The remainder of this paper is structured as follows.
Section 2 covers the related work.
Section 3 describes the applied methodology, the dataset, and how we utilized several transfer models for the two classification tasks.
Section 4 describes the different evaluation metrics we used to compare the various machine learning classifiers and covers the evaluation design and results.
Section 5 provides a detailed discussion on the model performance compared with existing works, the role of data augmentation on the model performance, the potential in the real-world applications, and clinical integration, as well as, in the section, the limitations and future directions of the proposed approach are also discussed.
Section 6 discusses the research findings and concludes the paper with avenues for future work.
2. Literature Review
Recently, the power of deep learning technologies has been applied in the effective and timely detection of COVID-19 from different image types [
10]. In this section, we delve into a comprehensive overview of previous studies focused on utilizing X-rays to detect COVID-19. Several surveys have provided a comprehensive overview of the approaches used in COVID-19 detection using different types of medical images. In [
4], Bhosale et al. presented a systematic review of the recent DL techniques utilized to classify COVID-19 from different lung and chest imaging. Other systematic reviews have recently been conducted in [
11,
12,
13].
Some previous studies have leveraged CT scan images for the detection of COVID-19. Zheng et al. proposed DeCoVNet [
14] to detect if the patient is infected with Coronavirus or not, depending on CT images. Firstly, they used a UNET pre-trained model for segmented lung regions then used a 3D deep neural network for prediction. To evaluate the proposed software, they collected CT images for the train and test sets from 13 December 2019 to 6 February 2020 and used data augmentation to increase the dataset size, and the ROC AUC achieved was 0.959. In [
15], Li et al. proposed a method to distinguish between COVID-19, Community-Acquired Pneumonia (CAP), and other non-pneumonia, using transfer learning techniques dependent on chest CTs. In the proposed method, they used UNET for segmentation, which used ResNet50 to extract features, then they fed these features into max pooling and, finally, they fed them into a fully connected layer and softmax activation function for prediction. The dataset used to evaluate this method was collected in six hospitals from 3322 patients, and it contained 4356 chest CT images. They found that the deep learning model demonstrated high accuracy in detecting COVID-19 and effectively distinguishing it from Community-Acquired Pneumonia and other lung diseases, where the AUC of COVID-19 was 0.96. Shi et al. [
16] aimed to find a deep learning model that would predict in cases of people with Coronavirus their symptoms and severity, based on the CT and initial clinical features. They used MOICT with some features that they obtained from health organizations, and they used the LASSO logistic regression model, which achieved good results, as the accuracy reached 0.890, which was higher than other manual results, like using MOICT, POICT, or PSI.
Moreover, Chen et al. [
17] utilized a deep learning model to detect pneumonia in COVID-19 patients, using CT scan images. In addition, they aimed to reduce the workload of radiologists. Their dataset was collected from Renmin Hospital, Wuhan University, where they collected 46,096 CT images from 106 patients, including 51 patients infected with COVID-19 pneumonia and 55 control patients of other diseases. They used the UNet++ model applied to 289 randomly selected CT images. These images had been labeled by experts, to find the intact area of the image in CT images, and the process of testing the UNet++ was conducted on other other randomly selected CT images. The UNet++ model achieved excellent results on the testing data, where the accuracy reached 100%. Song et al. [
18] proposed a deep learning model to detect COVID-19-infected patients from CT (Computed Tomography) images. They collected the CT images from more than one hospital; the dataset used contained 88 CT scan images of COVID-19 patients, 101 infected with other viruses’ pneumonia, and 86 healthy persons. In their approach, they used the DRE-Net model and achieved a high result on AUC of 0.99 and accuracy of 0.94 on the test set. Wang et al. [
19] proposed to find a deep learning model that could check up and diagnose the COVID-19 pneumonia patient based on CT scan images. Their dataset consisted of 453 CT images from infected patients and healthy persons. They used the Inception model as a baseline and added some enhancements, to enable the model to be learnable in the last layers. The proposed model achieved accuracy of 82.9%. Xu et al. [
20] proposed a deep learning model that would examine the COVID-19 pneumonia patient based on CT scan images. Their dataset consisted of 618 CT images of patients with COVID-19, Influenza-A viral pneumonia, and healthy persons. They used the location-attention-oriented model and achieved accuracy of 86.7%.
On the other hand, numerous studies have used X-ray images to detect COVID-19, employing various approaches. Prabira et al. [
21] proposed a transfer learning technique that consisted of Resnet50 and SVM, to classify COVID-19 using X-ray images. The ResNet50 model was selected after testing eight pre-trained models, including AlexNet, VGG16, VGG19, Google Net, ResNet50, ResNet101, and XceptionNetin order, to use the best. The authors used Resnet50 to extract deep features and then used SVM for classification. Finally, the proposed method was evaluated, using datasets collected from GitHub, Kaggle, and OpenAI. The accuracy achieved was 95.38%. To detect pneumonia using X-ray images, Rajpurkar et al. [
5] proposed an algorithm called CheXNet. They trained the dataset on a 121-layer Convolutional Neural Network. The CheXNet dataset, trained on the ChestX-ray14 dataset, contains about 112,120 frontal-view X-ray images with 14 different classes, including pneumonia. Gozes et al. [
22] proposed an AI-based tool that uses CT images for Coronavirus detection. This tool combines 2D and 3D deep learning models. They applied segmentation techniques to extract lung regions and then trained six different datasets for patients from China and the U.S. on ResNet50-2D. The sensitivity and specificity achieved were 98.2% and 92%, respectively.
Nahiduzzaman et al. [
23] proposed a novel model called the Chest X-ray6 model, which depends on the lightweight CNN model to detect five diseases, including COVID-19. They applied their approach to 9514 Chest X-ray images, which they collected from different databases with six classes. The dataset was unbalanced, they applied five different augmentation techniques, and the number became 21,000 images. Finally, they had two classification tasks: binary and multi-classification. Their model achieved accuracy of 97.94% and 80% for both tasks, respectively. In [
24], Constantinou et al. aimed to study the possibilities of a deep learning-based approach to detecting the COVID-19 disease. They focused on five models: ResNet50, ResNet101, DenseNet121, DenseNet 169, and InceptionV3. They evaluated their developed models on the COVID-QU dataset, which contains 33,920 X-ray images in three classes: COVID-19, Non-COVID-19, and Normal. The result showed that ResNet101 outperformed the other models, with accuracy of 96%.
In [
25], Dawar et al. developed a system that could distinguish the COVID-19 X-ray disease from others. The dataset used was collected from an open repository that contained 15,000 images categorized into COVID-19, Pneumonia, and Normal. Four Convolutional Neural Network models were used: VGGNet, LeNet5, AlexNet, and their custom model, which consisted of five convolutional layers followed by four dense layers. The result showed that the custom model performed best compared to the other models, with accuracy reaching 93.96%. In [
26], Chakraborty et al. proposed a deep learning model dependent on the ResNet18 pre-trained model. First, they collected 10,040 X-ray images from different open sources, like Kaggle and GitHub, to detect COVID-19 images among images classified as pneumonia and normal. Then, they applied their preprocessing steps and fed them into the model. Their model achieved accuracy of 96.43% and sensitivity of 93.68%.
Gupta et al. [
27] developed a system dependant on X-ray images to detect COVID-19 diseases. They used an open-source dataset with 2905 images labeled into three classes: COVID-19, viral pneumonia, and healthy. The deep learning models used in this study were VGG16, MobileNetV2, ResNet18, and AlexNet. They concluded that the AlexNet model outperformed the other models, with accuracy of 97.6%. In [
28], Dhiman et al. developed 11 Convolutional Neural Network (CNN)-based models—AlexNet, VGG16, VGG19, GoogleNet, ResNet18, ResNet50, ResNet101, InceptionV3, InceptionResNetV2, DenseNet201, and XceptionNet—to detect COVID-19 using X-ray images by classifying them into COVID-19 and normal. The dataset used was collected from open repositories like GitHub and Kaggle. They found that ResNet101 with the J48 decision tree classifier outperformed the other models, with accuracy of 98.54%. In [
29], Narin et al. proposed five transfer learning methods based on pre-trained models (ResNet50, ResNet101, ResNet152, InceptionV3, and Inception-ResNetV2). They applied their models on three different binary datasets: 1-(COVID-19 and Normal), 2-(COVID-19 and Viral Pneumonia), and 3-(COVID-19 and Bacterial Pneumonia), using X-ray images. They concluded that the ResNet50 model achieved the best results among the models on the three datasets, with accuracy for dataset 1: 96.1%, dataset 2: 99.5%, and dataset 3: 99.7%, respectively. In [
30], Ozturk et al. proposed a deep learning model based on the Darknet-19 model, with some modifications to the number of filters and convolutional layers, named the DarkCovidNet model. Their study was based on X-ray images collected from two different resources with three classes. They conducted their study on two classification tasks: binary (COVID and no-findings) and multi-classification (COVID, no-findings, and pneumonia). Their model achieved accuracy of 98.08% and 87.02% for both tasks, respectively. Enas in [
31] utilized a deep learning framework for early COVID-19 diagnosis using Chest X-ray images, with preprocessing for image enhancement and a classification phase applying pre-trained Convolutional Neural Network models (VGG19 and EfficientNetB0). The best model achieved high sensitivity of 0.96, specificity of 0.94, precision of 0.9412, an F1 score of 0.9505, and accuracy of 0.95 for binary classification of COVID-19 and normal Chest X-rays, and classification accuracy of 0.935 for a four-class classification. Recently, [
32] explored the effectiveness of several deep learning models, including Xception, VGG-16, and ResNet. Their work utilized two datasets: the first comprised 4050 X-ray images, and the second had 6378 images. Their results demonstrated that the Xception-Enhanced Model achieved precision of 98.8%, significantly outperforming the ResNet50 model, which had precision of 60%. The standard Xception model and VGG-16 also performed well, with precisions of 86.74% and 92%, respectively.
Table 1 provides a summary of the prior research discussed in this study.
Recently, several studies in the detection of COVID-19 from Chest X-rays have been introduced, utilizing different deep learning models. Bukhari et al. [
33] utilized DenseNet169, demonstrating validation accuracy of 100%, outperforming the ResNet and VGG models. Roy et al. [
34] proposed a model combining Xception, InceptionV3, and ResNext50, resulting in accuracy of 98.44%, which showed a 4.44% improvement over prior studies. Comparing the proposed model in this work, the utilization of Xception for both binary and multi-class classification achieved similar binary classification accuracy of 98.13% and multi-class accuracy of 87.73%. The results obtained in our research illustrate the proposed approach’s competitiveness with other state-of-the-art models, especially for multi-class classification tasks, where our model performed comparably to more complex architectures. Moreover, Ramkumar et al. [
35] have proposed a new approach that combines MobileNetv1 with Jellyfish Search Optimization for COVID-19 detection. The proposed method includes multi-head attention mechanisms that improve precision and computational efficiency. However, our proposed model maintains high accuracy and recall without requiring additional optimization techniques, which reinforces the model’s simplicity and robustness. Henna et al. [
36] also used transfer learning techniques, with CLAHE-based data augmentation, to train models like AlexNet and VGG16 on smaller datasets. Our proposed Xception-based model with extensive data augmentation achieved superior results on a much larger dataset, which shows the scalability and efficacy of our proposed approach. More recent proposed models, such as Singh et al. [
37], used VGG16 with transfer learning for COVID-19 detection, showing strong results in feature extraction using data augmentation and pre-trained weights. Ali et al. [
38] utilized a modified CNN with k-Nearest Neighbor to classify COVID-19 severity, reporting 92.80% testing accuracy. Meanwhile, Rashed et al. [
39] utilized a Conditional Cascaded Network (CCN) with transfer learning, showing high precision and specificity using multiple datasets, while Khattab et al. [
40] integrated focal loss with several deep learning models, like InceptionResNet V2 and Xception, and they showed classification accuracy of up to 100% on some datasets. While these proposed works emphasized the adaptability of transfer learning and optimization techniques, our proposed model combines Xception and data augmentation, yielding 87.73% multi-class accuracy and 90.20% precision, thus remaining highly competitive in both performance and simplicity. Additionally, the work proposed by Rashed et al. [
39] employed a CNN approach for COVID-19 diagnosis using Chest X-rays and CT images, demonstrating robust performance metrics across different architectures. Moreover, the work of Khattab et al. [
40] combined transfer learning models and data-mining techniques for class imbalance. These comparative results indicate that while several state-of-the-art techniques utilize complex multi-model architectures or novel optimization methods our proposed approach remains competitive, with a focus on simplicity, transfer learning, and effective data augmentation.
Table 1.
Overview of related research in COVID-19 detection using medical imaging.
Table 1.
Overview of related research in COVID-19 detection using medical imaging.
Ref. | Year | Image Type | Approach | Dataset | Results |
---|
[40] | 2024 | X-ray | Four models:
InceptionResNet V2, MobileNet Inception V3, and Xception. | Four public datasets. | For the first dataset, the InceptionResNet V2 was 88.63%. For the second and fourth datasets, the Inception accuracy was 94.35% and 97.67%. |
[32] | 2024 | X-ray | Xception, VGG-16, and ResNet. | Two datasets: the first one comprises 4050 images; the second one has 6378 images. | Precision for Xception-Enhanced Model (98.8%), ResNet50 (60%), Xception (86.74%), and VGG-16 (92%). |
[31] | 2024 | X-ray | CNN models. | 10,192 Normal cases, 3616 positive COVID-19 cases, 1345 Viral Pneumonia cases, and 6012. | Sensitivity of 0.96, specificity of 0.94, precision of 0.9412, F1 score of 0.9505 and accuracy of 0.95. |
[23] | 2023 | X-ray | Lightweight CNN model. | 9514 Chest X-ray images from different databases. | Accuracy for binary classification = 97.94% and for multi-classification task = 80%. |
[24] | 2023 | X-ray | ResNet101. | 33,920 X-ray images called COVID-QU-Ex. | Accuracy = 96%. |
[25] | 2023 | X-ray | Customized model consisting of five convolutional layers followed by four dense layers. | 15,000 X-ray images from open-source repositories. | Accuracy = 93.96%. |
[26] | 2022 | X-ray | ResNet model. | 10,040 X-ray images from different open-source repositories. | Accuracy = 96.43%, Sensitivity = 93.68%. |
[21] | 2020 | X-ray | Transfer learning technique consists of Resnet50 and SVM. | 381 X-ray images from different open-source repositories. | Accuracy = 95.38%. |
[5] | 2017 | X-ray | CheXNet algorithm. | ChestX-ray14 dataset. | CheXNet outperforms radiologists and previous state-of-the-art models. |
[22] | 2020 | CT | Segmentation techniques to extract lung region, and ResNet50-2D for classification. | 6150 CT slices of 157 international patients (China and U.S.). | Sensitivity = 98.2% and specificity = 92.2%. |
[14] | 2020 | CT | Proposed software system called (DeCoVNet), using UNET for segmentation and 3D deep neural network for classification. | Collected CT images. | ROC AUC = 0.959 |
[15] | 2020 | CT | Transfer learning techniques (UNET, ResNet50, max pooling, fully connected layer, and softmax activation function). | Collected in six hospitals from 3322 patients; contained 4356 chest CTs. | AUC = 0.96. |
[16] | 2020 | CT | Based on MOICT and used the LASSO logistic regression model. | CT images of a total of 196 patients. | Accuracy = 89% |
[17] | 2020 | CT | Detecting COVID-19 patients using CT images by using UNet++ model. | 46,096 CT images. | Accuracy = 100%. |
[18] | 2020 | CT | DRE-Net model. | Dataset containing 88 CT scan images of COVID-19 patients. | Accuracy = 94%, AUC = 0.99. |
[19] | 2020 | CT | Using the Inception model. | 453 CT images from infected patients and healthy persons. | Accuracy = 82.9%. |
[20] | 2020 | CT | Using the location-attention-oriented model. | 618 CT images from COVID-19 and Influenza-A viral pneumonia patients and from healthy persons. | Accuracy = 86.7%. |
Recently, Large Language Models (LLMs) [
41] have taken the spotlight in the natural language processing domain. Furthermore, the integration between LLMs and vision enables the users to explore emergent abilities with multimodal data [
42]. Recent advances in Vision–Language Models (VLMs) have demonstrated significant potential for medical imaging analysis and diagnosis tasks [
43]. Lozano et al. introduced
-Bench, which is a benchmark designed to evaluate the performance of VLMs in microscopy tasks, highlighting the challenges of applying these models in distinguishing between microscopy modalities and domains. Their work revealed that even state-of-the-art VLMs struggle with basic biomedical tasks, which underscores the need for further model development to enhance their utility in medical applications [
44]. Moreover, in [
45] Moon et al. developed MedViLL, which is a BERT-based model tailored for vision–language tasks in radiology, achieving superior performance in diagnosis classification and medical image report retrieval. MedViLL shows the potential for multimodal approaches to improving the generalizability and interpretability of AI models in the medical field. Radford et al. [
46] introduced the use of vision–language models, like CLIP, that leverage large-scale datasets of image–text pairs, to enable the zero-shot learning utilized in various computer vision tasks. Their approach has inspired applications in medical imaging, as in the work developed by Huang et al. [
47], where PLIP—a pathology-specific vision–language model was trained on a large dataset of images from medical Twitter. PLIP demonstrated state-of-the-art performance in pathology image classification, especially in zero-shot scenarios, illustrating the adaptability of VLMs in medical diagnosis. Zhang et al. [
48] examined the performance of popular multimodal LLMs, such as GPT-4 and Gemini, for a variety of medical imaging tasks, noting strengths in report generation and lesion detection. Panagoulias et al. [
49] evaluated utilizing GPT-4’s diagnostic accuracy in pathology, demonstrating promising results but highlighting specific weaknesses in the model’s knowledge graph integration and entity recognition. Generally, the integration of vision–language models in the medical imaging domain is still in its early stages, with ongoing research and efforts to improve their robustness and applicability in clinical settings and analysis tasks.