Enhancing COVID-19 Detection: An Xception-Based Model with Advanced Transfer Learning from X-ray Thorax Images

Rapid and precise identification of Coronavirus Disease 2019 (COVID-19) is pivotal for effective patient care, comprehending the pandemic’s trajectory, and enhancing long-term patient survival rates. Despite numerous recent endeavors in medical imaging, many convolutional neural network-based models grapple with the expressiveness problem and overfitting, and the training process of these models is always resource-intensive. This paper presents an innovative approach employing Xception, augmented with cutting-edge transfer learning techniques to forecast COVID-19 from X-ray thorax images. Our experimental findings demonstrate that the proposed model surpasses the predictive accuracy of established models in the domain, including Xception, VGG-16, and ResNet. This research marks a significant stride toward enhancing COVID-19 detection through a sophisticated and high-performing imaging model.


Introduction
The relentless global impact of the COVID-19 pandemic has underscored the critical importance of early and accurate detection in safeguarding public health, mitigating economic repercussions, and ensuring the long-term well-being of communities worldwide [1][2][3].Early detection allows diseases to be diagnosed at a stage when they are more likely to respond to treatment, reduces morbidity and mortality, prevents complications and consequences, and lowers healthcare costs.Timely detection not only informs effective patient care but also serves as a linchpin in elevating long-term survival rates, emphasizing the pressing need for innovative diagnostic methodologies.It, therefore, seems important to think about setting up systems capable of facilitating the early detection of disease to contribute to public healthcare management.
Over the years, medical imaging has emerged as an indispensable tool for the early detection, monitoring, and post-treatment follow-up of diseases.From the inception of computer-aided diagnostic systems in the early 1980s to the contemporary era of advanced artificial intelligence (AI) applications, the trajectory of medical image analysis has evolved significantly.While early approaches focused on sequential processing and mathematical modeling, the advent of AI, inspired by the human brain's learning mechanisms, has ushered in a new era of sophisticated diagnostic systems [4,5].
Machine learning, a cornerstone of this transformative paradigm, empowers software applications to enhance predictive accuracy without explicit programming.At the forefront of global health concerns, the ongoing COVID-19 pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), demands innovative solutions for effective patient screening.Despite the widespread adoption of the reverse transcription polymerase chain reaction test, its limited positivity rate and inability to differentiate SARS-CoV-2 from other respiratory infections underscore the urgent need for alternative screening methods [6][7][8].
Artificial intelligence models for rapid disease detection are systems that employ machine learning, image analysis, or natural language processing techniques to identify individuals infected with SARS-CoV-2, the virus responsible for COVID-19.These models can rely on various types of data, such as radiological images, antigen tests, reported symptoms, or genomic data, offering speed, accuracy, user-friendliness, and cost reduction.They play a crucial role in screening suspected cases, guiding patients to appropriate care, monitoring disease progression, and controlling virus spread.
However, detecting and accurately distinguishing between different strains of SARS-CoV-2 poses a formidable challenge in the landscape of disease detection.The evolving nature of the virus, coupled with its propensity for genetic mutations, introduces complexities that demand innovative solutions.One crucial hurdle lies in the scarcity of sufficient, reliable, and representative data for training and validating models, especially for emerging virus variants.These variants may necessitate adaptations or updates to existing models, highlighting the need for continuous vigilance and adjustment.Additionally, the variability in model performance across diverse contexts, populations, environments, and usage protocols further accentuates the intricacy of the task.The ethical, legal, and social dimensions of deploying these models also contribute to the multifaceted challenges, encompassing issues of privacy, data protection, responsibility, transparency, explainability, security, reliability, and trust.The gravity of these challenges necessitates a comprehensive and principled approach to model development and usage, emphasizing the urgency of addressing these intricacies for the advancement of disease detection strategies.
To tackle these challenges, adherence to principles and best practices for the development and use of artificial intelligence models for rapid disease detection is crucial.Guidelines proposed by organizations like the World Health Organization (WHO) [9], the Organisation for Economic Co-operation and Development (OECD), and the European Commission provide valuable insights.
Furthermore, fostering collaboration and data, knowledge, and experience sharing among stakeholders involved in the fight against COVID-19, including researchers, healthcare professionals, policymakers, industry professionals, and citizens, is essential.This collaborative approach can significantly contribute to overcoming the challenges posed by the rapidly evolving viral threat and enhancing the effectiveness of AI models in disease detection.
The key contributions of this paper include the proposition and validation of an innovative model, synthesized from the strengths of existing architectures and enriched through transfer learning.Experimental results demonstrate superior predictive accuracy compared to benchmark models.By advancing the state of the art in COVID-19 detection, this research significantly contributes to global efforts to revolutionize patient care pathways and bolster long-term survival rates.In the next section, an introduction to transfer learning is provided to facilitate a smooth transition into the discussion on its relevance to the proposed deep learning model.
The remainder of this paper is organized as follows.Section 2, "Materials and Methods", presents the methodology used for our innovative deep learning model for automated COVID-19 detection, and the dataset and experimental setup are also outlined.Section 3, "Results", presents the detailed findings from the validation experiments.In Section 4, "Discussion", the implications, challenges, and limitations of the results are comprehensively discussed, along with a comparison with related works.Section 5, "Conclusions", concludes the paper with a synthesis of the key insights and future directions for the proposed model.

Xception Model
Xception, short for "Extreme Inception", is a state-of-the-art deep learning architecture proposed by François Chollet in 2017 [10].It represents a significant advancement in convolutional neural network (CNN) design, particularly tailored for image classification tasks [11].At its core, Xception embodies a fundamental departure from conventional CNN architectures, introducing a novel approach to convolution operations.
Traditional CNNs rely on standard convolutional layers to extract features from input images.These layers apply a set of learnable filters across the entire input volume, producing feature maps that capture spatial patterns.However, this approach often leads to an excessive number of parameters, resulting in computational inefficiency and increased risk of overfitting.
In contrast, Xception introduces depth-wise separable convolutions, a concept borrowed from the Inception family of architectures [12].Depth-wise separable convolutions decompose the standard convolution operation into two distinct stages: depth-wise convolution and point-wise convolution.
Let F represent the input feature map, K denote the kernel (or filter), and S signify the stride length.The depth-wise convolution operation is defined as: where F ′ ij represents the output feature map and K mn denotes the corresponding element of the kernel.By performing convolutions independently across each channel of the input feature map, depth-wise convolutions significantly reduce computational complexity while preserving spatial information.
Following the depth-wise convolution, point-wise convolutions are applied to integrate information across channels.This operation is expressed as: where K k represents the k-th element of the point-wise kernel.By incorporating both depth-wise and point-wise convolutions, Xception achieves a remarkable balance between computational efficiency and expressive power.
In addition to its architectural innovations, Xception employs other techniques, such as batch normalization and ReLU activations, to enhance model stability and convergence speed [13].These elements collectively contribute to Xception's exceptional performance in image classification tasks, making it a preferred choice for various applications, including medical image analysis.
In the subsequent sections, we delve deeper into the integration of Xception within our proposed deep learning paradigm, elucidating its role in revolutionizing automated disease detection from medical imaging data.

Transfer Learning
Transfer learning is a powerful concept in machine learning that leverages knowledge gained from one task to improve performance on a different but related task [14].In the context of deep learning and neural networks, transfer learning involves using pretrained models on large datasets and fine-tuning them for specific tasks.This approach is particularly beneficial when the target task has limited labeled data.
Transfer learning can be conceptualized as follows: let S denote the source domain, T denote the target domain, P(S) denote the probability distribution of the source domain, P(T) denote the probability distribution of the target domain, X denote the input space, and Y denote the output space.The model's objective is to learn a mapping f : X → Y that performs well on T based on the knowledge acquired from S.
Transfer learning encompasses various strategies, such as feature extraction and finetuning.Feature extraction involves using the pre-trained model's early layers as generic feature extractors and appending task-specific layers for the target task.Fine-tuning, on the other hand, refines the entire model on the target task by adjusting the weights of all layers while retaining the knowledge gained from the source task.

Transfer Learning Framework: Feature Extraction, Encoding, Decoding, and Feature Generation
In the transfer learning phase, we introduce various layers, including a feature extraction module that enables us to address the expressiveness issue resulting from different locations and times of image capture.This module precedes encoding and decision making for effective preprocessing.The methodology encompasses a feature-encoding module, a decoding module, and a feature-generation segment.This approach is both sequential and parallel, enhancing the overall efficiency of the process.
In our proposed model for COVID-19 detection, transfer learning plays a pivotal role in overcoming the challenges associated with the evolving nature of the SARS-CoV-2 virus.By leveraging pre-existing knowledge from a large dataset, the model can effectively learn relevant features for distinguishing between different strains, enhancing its accuracy and robustness in the face of emerging variants.The subsequent sections delve into the specifics of our novel model and the experimental validation conducted to demonstrate its superior predictive accuracy compared to established models.
In the quest for a groundbreaking solution to the crucial problem at hand, we introduce a novel deep learning paradigm meticulously designed to redefine the landscape of automated COVID-19 detection from X-ray thorax images.Our proposed model seamlessly integrates the robust Xception architecture for pattern recognition, offering a transformative approach to enhance diagnostic accuracy in the realm of healthcare.

Model Architecture
The architectural prowess of our proposed model is illustrated in Figure 1, encapsulating the fusion of cutting-edge techniques for feature extraction, transfer learning, and classification.In the spirit of innovation, we have strategically organized the model into distinctive blocks, each contributing to the overall efficacy.Our model's architecture, meticulously crafted to train on existing data and predict outcomes for new individuals, stands as a testament to the sophistication required in the healthcare domain.
Our model is subdivided into three blocks.The first contains the basic Xception model.The second contains a "GloabalAveragePooling2D" layer followed by four "BatchNormalization" layers, which provide data belonging to the same scale.This makes the neural network easier to train.Normalization, therefore, consists of formatting the input data to facilitate the machine learning process.These layers are separated by a dropout layer, with a rate of 0.5 to minimize the risk of overlearning, and a dense layer with 256 units linking all the layers of the network.The third and final block consists of a dense layer with a "softmax" activation function.

Layer Model
Considering the combination of two functions ϖ and φ to produce a new function ϖ × φ, we define the convolution model in Equation ( 3) To this, we add its decomposition into two distinct stages: a depth convolution, applying a spatial filter of size J × J to each input channel, and a point convolution, then applying a 1 × 1 filter to all output channels, allowing for the reduction or increase in the number of channels, ensuring depth-separable convolution.An Xception block is necessary in our approach, as it constitutes a basic unit of the Xception approach.To this end, we consider a succession of depth-separable convolution layers, followed by a batch normalization layer and an activation function, as illustrated by the layered model shown in Figure 2.
To add transfer learning, let us consider any task denoted by T as an image classification function defined by Equation (4), as follows: where V is the input space and W is the output space.
To predict the images, we then define f , a function that associates a numerical value with each element of V or W. To have a function that approximates the function to be learned, using adjustable parameters, we denote M as an image classification model such that M(v) defines the model prediction for image v ∈ V.
Considering that the source and target tasks share certain common features, which can be captured by the model, we then define two subsets as follows: The features are given by f 1 , f 2 , ..., f n ∈ F and the parameters by p 1 , p 2 , ..., p m ∈ P.
We have ∃F ⊂ f :

Integration of Xception Model: Rationale and Advantages
The decision to incorporate the Xception network into our model stems from its multifaceted advantages.Xception, evolving from Inception modules within convolutional neural networks, strategically positions itself as an intermediate step between conventional convolution and the depth-separable convolution operation.This distinctive characteristic empowers our model with unparalleled adaptability and expressive capacity, crucial for intricate nuances in COVID-19 pattern recognition within X-ray thorax images.
Our proposed Xception-Enhanced Transfer Learning Model represents a pioneering stride toward revolutionizing the diagnostic landscape in healthcare.By harnessing the strengths of Xception and seamlessly integrating them into our deep learning paradigm, we anticipate a paradigm shift in the accuracy and efficiency of automated COVID-19 detection.The subsequent sections delve into the experimental validation, results, and discussions, providing a comprehensive narrative of the model's performance and its potential impact on the broader healthcare domain.
Furthermore, while numerous image analysis methods, such as the YOLO model, demonstrate high performance in computer vision, our specific study opts for the Xception model due to its exceptional performance in the medical domain.Unlike other models, Xception introduces depth-wise separable convolutions, enhancing its capabilities.
The choice of Xception for our methodology is grounded in its prowess in medical applications and its innovative architectural modifications, contributing to improved computational efficiency without compromising performance.

Tools and Technologies Used
The realization of our work leveraged cutting-edge tools and technologies.On the hardware front, an HP Probook computer with a Windows 10 operating system, 64-bit architecture, Intel(R) Core i7-9700F CPU @ 3.00GHz, and 16 GB of RAM played a pivotal role.The software toolkit used in the experiment included TensorFlow 1.5.3,Keras 2.15.0,Scikit-learn 1.2.2,Scikit-image 0.19.3,Python 3.11.6,and Flask 3.0.2.

Dataset
To evaluate our model, we utilized X-ray images from COVID-19 patients sourced from multiple datasets, including hospital data related to the COVID-19 outbreak [15,16] and Kaggle data [16].We considered two datasets.The first one comprised 4050 images, with 3000 images for training and 1050 images for testing.The second one comprised 6378 images, with 4878 images for training and 1500 images for testing.These datasets included images from confirmed COVID-19 patients, normal individuals, and pneumonia patients.Figure 3 showcases a sample of X-ray images from these datasets.

Xception-Enhanced Transfer Learning Model
Figure 4 displays the performance results obtained by the Xception model using the same dataset (Table 1) over 10 epochs during training.
The proposed model, with 21,412,395 parameters, was first trained for 10 epochs utilizing the Adam optimizer (learning rate: 0.0001) and categorical cross-entropy for error computation.The key optimization techniques included LearningRateSchedule and ReduceLROnPlateau to dynamically adjust the learning rates.The model achieved an impressive accuracy of 96% in training and 97% in validation, with error rates of 0.07% and 0.06%, respectively.The classification report of our proposed model, as shown in Table 3, highlights exceptional performance across all classes (COVID-19, normal, and pneumonia), with precision and recall scores consistently exceeding 92.9%.The overall F1-score of 97.6% underscores the model's robustness in accurately classifying X-ray images, marking a significant advancement in the field of COVID-19 detection.Figure 5 presents the confusion matrix of the Xception model, showcasing its robust performance.Figure 6 summarizes the relationship between accuracy and loss for our proposed model during training and validation.These results were obtained using the dataset summarized in Table 1 over   We trained our model using a dataset comprising 3902 images distributed across three classes (COVID, normal, and pneumonia) for training data and 1500 for testing, necessitating a methodical approach (Table 2).Firstly, it was crucial to preprocess the images by resizing them to (224, 224, 3), normalizing the pixel values, and splitting them into distinct sets for training and validation.Subsequently, we compiled the model by defining the appropriate loss function (such as cross-entropy) and optimizer (Adam) to guide network learning.While we acknowledge that the typical input size for Xception models is 299 × 299 × 3, we resized our X-ray images to 224 × 224 × 3 during preprocessing for compatibility with our dataset.This resizing was conducted while considering the balance between computational efficiency and preserving essential diagnostic information.Despite the dimension reduction, standard resizing techniques were employed to maintain the integrity of the images.The model was trained on the training data over multiple iterations (epochs), totaling 21 epochs, where it adjusted its weights to minimize loss and enhance performance.Following training, it was crucial to evaluate the model on test data to assess its ability to generalize to new, unseen data.By analyzing metrics such as accuracy, precision, and recall for each class, we can assess the model's performance and identify areas for improvement.Finally, adjustments can be made to the model, such as tuning hyperparameters or augmenting data, to enhance its performance (Figure 7).4, by increasing the size of the dataset and the number of epochs, our model demonstrates even higher performance.In addition, an analysis of Figure 8a,b reveals deviations starting from the ninth epoch.To address this, we extended the training duration from 21 to 50 epochs.As depicted in Figure 10a,b, while occasional deviations persisted at certain epochs, the overall trend stabilized as training progressed toward the 50th epoch.These observations underscore the robustness and reliability of our findings, offering a deeper insight into the behavior and performance characteristics of the model.

Benchmark Models
To benchmark our model, we selected ResNet50, VGG-16, and Xception, presenting a comparative summary in Table 5.

Evolution of Neural Network Architectures
In contemporary computer vision applications, convolutional neural networks (CNNs) stand as a cornerstone, offering a versatile architecture characterized by alternating convolution and subsampling layers.The structural arrangement of these layers, complemented by innovations such as batch normalization [13] and dropout [17], significantly impacts the network's performance.Tailoring the layout of CNN layers plays a pivotal role in architectural design, influencing the overall efficacy of the model [18,19].
A myriad of pattern recognition models has emerged in the recent literature, each leveraging distinct neural network architectures.Pioneering models like LeNet [20], AlexNet [21], GoogleNet [22], ResNet [23,24], DenseNet [5], and Xception have demonstrated the evolution of neural network design.In the quest for enhanced performance, researchers have explored innovative strategies, combining data and prior knowledge in hybrid models [25].Noteworthy examples include VGG, which analyzes the impact of depth on accuracy through a network with up to 19 layers [26].Despite its success, VGG's computational complexity, with 138 million parameters, hinders deployment on resource-constrained systems.
In addressing this limitation, GoogleNet, also known as Inception-V1, introduced a block concept, leveraging convolutional transformations at multiple scales while minimizing computational cost [27].This approach replaces conventional layers with small blocks, reducing the parameter count from 138 million to 4 million.The model's heterogeneity, however, necessitates customization for each network.ResNet, proposed in [13], pioneered residual learning, featuring an architecture of up to 152 layers organized into residual blocks.ResNet exhibits reduced computational complexity and lower error rates in classification tasks.
The evolution of neural network architectures continued with Xception, introduced in [10] to enhance Inception-V3 modules through depth-separable convolutions.Departing from the original Inception-V3 block, Xception expands and replaces diverse convolution operations with a single 3 × 3 convolution followed by a 1 × 1 convolution, effectively regulating computational complexity.Notably, Xception has outperformed benchmarks such as VGG-16, ResNet, and Inception-V3 in traditional classification challenges, offering a resource-efficient alternative for clinical model development due to its commendable trade-off between resource consumption and accuracy [10].We provide an insightful overview of key models, namely ResNet50, VGG-16, and Xception, encompassing their architectural intricacies, intended purposes, performance metrics, and inherent limitations.The summarized information is presented in Table 6, offering a comprehensive comparison to contextualize our proposed model's advancements.-

Performance Evaluation and Benchmarking
In the evaluation and benchmarking results presented in Table 5, our innovative model, the Xception-Enhanced Transfer Learning Model, demonstrated superior accuracy compared to its counterparts.To further strengthen the credibility of our findings, we calculated the confidence intervals for key performance metrics including accuracy, precision, recall, and F1-score.These intervals offer valuable insights into the variability of our model's performance, enhancing the robustness of our results.With a confidence level of 95%, the calculated 95% confidence interval for the accuracy of our model was approximately 0.991 ± 0.00477, indicating a range from 0.98623 to 0.99577.The inclusion of these confidence intervals alongside our results provides a comprehensive assessment of the stability and reliability of our model's performance.Notably surpassing the highly acclaimed Xception model, our innovation has set a new standard in the field.Figures 4, 6 and 8 vividly depict the superior accuracy of our model compared to Xception.
Utilizing the dataset showcased in Figure 3, our model achieved an exceptional precision of over 98.8%, surpassing that of ResNet50 (60%), Xception (86.74%), and VGG-16 (92%), as highlighted in Table 5.The confusion matrices presented in Figures 5, 7 and 9 further underscore the superior performance of our model compared to Xception.
This stellar performance can be attributed to the strategic incorporation of transfer learning, specifically the synergy between Xception and our proposed model.The integration of transfer learning enhances the accuracy and overall efficacy of our model.The resource-efficient nature of our model, coupled with its ability to achieve robust performance with a modest amount of labeled training data, distinguishes it as the pinnacle of current machine learning models.

Conclusions
In this study, we introduced the Xception-Enhanced Transfer Learning Model as a novel approach for precise COVID-19 detection from X-ray images.Leveraging the power of transfer learning, our model has demonstrated exceptional performance across various metrics, surpassing established benchmarks and setting a new standard in diagnostic accuracy.
Our model's success is evidenced by its outstanding performance compared to other models, as summarized in Table 5.With a training accuracy of 96% and a validation accuracy of 97%, our model consistently outperforms ResNet50, VGG-16, and even the baseline Xception model.Furthermore, our model exhibits impressive recall and precision rates of 97% and 98.8%, respectively, highlighting its robustness in correctly identifying COVID-19 cases while minimizing false positives.
By harnessing transfer learning, our approach not only achieves superior accuracy but also addresses key challenges in model development.The utilization of pre-trained models, such as Xception, significantly reduces the need for extensive labeled data, making our model both resource-efficient and scalable for deployment in real-world settings.Moreover, the integration of transfer learning enhances our model's adaptability to diverse datasets and medical imaging tasks, paving the way for future advancements in diagnostic methodologies.
Our findings underscore the transformative potential of advanced machine learning techniques in combating global health crises.The Xception-Enhanced Transfer Learning Model represents a paradigm shift in COVID-19 detection, offering enhanced diagnostic capabilities that can significantly impact patient care pathways.Beyond its immediate implications for COVID-19 diagnosis, our model lays the foundation for the development of innovative diagnostic tools in healthcare, promising improved outcomes and better management of infectious diseases.

Figure 1 .
Figure 1.Architecture of the proposed Xception-Enhanced Transfer Learning Model.In blue is the feature extraction block, in yellow is the transfer learning block, and in orange is the classification block.
where M S is the source model associated with the source task T S = (V S , W S ) and M C is the target model used by the target task T C = (V C , W C ).

Figure 2
illustrates our proposed layer model, showcasing the intricate composition that underlies the model's ability to navigate the complexities of X-ray thorax images.The layer model emphasizes the interplay between feature extraction, transfer learning, and classification layers, providing a comprehensive insight into the neural architecture's depth and sophistication.

Figure 2 .
Figure 2. Layer model of our proposed Xception-Enhanced Transfer Learning Model.

Figure 3 .
Figure 3. Sample of images from the dataset used [16].

Figure 4 .
Figure 4. (a) Accuracy and (b) loss for the Xception model using the X-ray images dataset (10 epochs).
Figure5presents the confusion matrix of the Xception model, showcasing its robust performance.Figure6summarizes the relationship between accuracy and loss for our proposed model during training and validation.These results were obtained using the dataset summarized in Table 1 over 10 epochs during training.

Figure
Figure 8a,b depict the training and validation accuracy, respectively, after training our model with 80% of the data used, along with the error rates for 80% of the training data.The corresponding confusion matrix, as illustrated in Figure 9, visualizes the model's performance, displaying the number of correct and incorrect predictions made by the model for each class compared to the true labels of the data.This matrix is useful for evaluating accuracy, recall, specificity, and other model performance metrics, helping in the identification of classification errors and areas where the model can be improved.As shown in Table4, by increasing the size of the dataset and the number of epochs, our model demonstrates even higher performance.

Figure 8 .
Figure 8.(a) Accuracy and (b) loss for the Xception-Enhanced Transfer Learning Model using the X-ray images dataset (21 epochs).

Figure 10 .
Figure 10.(a) Accuracy and (b) loss for the Xception-Enhanced Transfer Learning Model using the X-ray images dataset (50 epochs).

Table 1
displays the distribution by class of the first dataset with 3000 images (training), including 2400 images for training and 600 images for validation.

Table 2
presents the class distribution of the second dataset with 4878 images (train), including 3902 training images and 976 images for validation.

Table 3 .
Classification report of our proposed model over 10 epochs based on the data presented in Table1.COVID-19 is class 0, normal is class 1, and pneumonia is class 2.

Table 4 .
Classification report of our proposed model over 21 epochs based on the data presented in

Table 5 .
Comparative table of the results obtained for different models.