Enhancing Pulmonary Diagnosis in Chest X-rays through Generative AI Techniques

Sanida, Theodora; Sanida, Maria Vasiliki; Sideris, Argyrios; Dasygenis, Minas

doi:10.3390/j7030017

Open AccessArticle

Enhancing Pulmonary Diagnosis in Chest X-rays through Generative AI Techniques

¹

Department of Electrical and Computer Engineering, University of Western Macedonia, 50131 Kozani, Greece

²

Department of Digital Systems, University of Piraeus, 18534 Piraeus, Greece

^*

Author to whom correspondence should be addressed.

J 2024, 7(3), 302-318; https://doi.org/10.3390/j7030017

Submission received: 20 June 2024 / Revised: 6 August 2024 / Accepted: 9 August 2024 / Published: 13 August 2024

(This article belongs to the Special Issue Integrating Generative AI with Medical Imaging)

Download

Browse Figures

Versions Notes

Abstract

Chest X-ray imaging is an essential tool in the diagnostic procedure for pulmonary conditions, providing healthcare professionals with the capability to immediately and accurately determine lung anomalies. This imaging modality is fundamental in assessing and confirming the presence of various lung issues, allowing for timely and effective medical intervention. In response to the widespread prevalence of pulmonary infections globally, there is a growing imperative to adopt automated systems that leverage deep learning (DL) algorithms. These systems are particularly adept at handling large radiological datasets and providing high precision. This study introduces an advanced identification model that utilizes the VGG16 architecture, specifically adapted for identifying various lung anomalies such as opacity, COVID-19 pneumonia, normal appearance of the lungs, and viral pneumonia. Furthermore, we address the issue of model generalizability, which is of prime significance in our work. We employed the data augmentation technique through CycleGAN, which, through experimental outcomes, has proven effective in enhancing the robustness of our model. The combined performance of our advanced VGG model with the CycleGAN augmentation technique demonstrates remarkable outcomes in several evaluation metrics, including recall, F1-score, accuracy, precision, and area under the curve (AUC). The results of the advanced VGG16 model showcased remarkable accuracy, achieving 98.58%. This study contributes to advancing generative artificial intelligence (AI) in medical imaging analysis and establishes a solid foundation for ongoing developments in computer vision technologies within the healthcare sector.

Keywords:

pulmonary diagnosis; CycleGAN; generative AI techniques; deep learning algorithms; lung diagnosis; chest X-ray imaging

1. Introduction

Chest X-rays are indispensable tools in the field of medical diagnostics, widely recognized for their prevalence and accessibility within healthcare settings. These radiological examinations play a fundamental role in the initial assessment of pulmonary conditions, serving as a crucial first line of inquiry. Their importance lies in the capacity to provide timely and vital insights, enabling healthcare professionals to make informed and accurate medical decisions. The diagnostic utility of chest radiography is particularly significant due to its ability to detect a diverse range of lung anomalies. This includes common respiratory conditions like pneumonia and opacity, which are major public health concerns globally, as well as more complex and severe pathologies such as lung cancers and interstitial lung diseases. The early detection of these conditions through chest X-rays is critical, as it can dramatically affect treatment decisions and patient outcomes. As such, chest radiography not only aids in the timely management of diseases but also plays a pivotal role in enhancing the overall efficacy of medical interventions, highlighting its essential place in modern medical practice [1,2,3].

However, despite their extensive application, interpreting chest X-rays remains complex and skill-intensive. An accurate diagnosis from these images often necessitates a considerable degree of expertise and experience, attributes that are cultivated over many years of clinical practice. Misinterpretations can lead to significant adverse consequences, including the potential for delayed diagnosis or the administration of inappropriate treatments, which can adversely affect patient outcomes. This challenge is further intensified by the high volume of imaging studies that radiological departments must handle, often straining the resources and capacity of these services. Such conditions increase the risk of diagnostic errors and result in prolonged waiting times for patients, thereby heightening their risk and complicating their conditions further. Given these issues, there is an urgent need to develop and implement enhanced diagnostic tools. These tools should improve the accuracy, efficiency, and accessibility of pulmonary assessments, thereby supporting radiologists in their critical work and ensuring that patients receive timely and accurate medical care. This advancement in diagnostic capabilities is essential for mitigating the pressures faced by healthcare systems and enhancing the quality of care provided to patients with pulmonary conditions [4,5,6,7].

Medical image analysis has significantly transformed in recent years by integrating deep learning (DL) technologies. These technologies are renowned for their ability to automate and refine complex analytical tasks, opening up new possibilities in medical radiodiagnosis. The effects of DL are notably impactful in chest radiology, where these methods have demonstrated remarkable ability. DL architectures are noted for their superior pattern recognition abilities, significantly improving the interpretation of chest radiographs. This enhancement has brought about a more accurate approach to identifying and diagnosing respiratory ailments. Leveraging extensive datasets of medical images, these architectures can detect complicated anomalies and patterns that might be missed by conventional approaches, thereby supplying a complete understanding of lung infections. Therefore, the application of DL in chest radiology marks not just a specialized leap but also a major stride in enhancing the diagnostic and treatment capabilities of healthcare professionals. Furthermore, the integration of DL into chest radiology not only elevates the precision of diagnoses but also facilitates the creation of customized therapy methods. This development is vital for improving patient outcomes, marking substantial progress in the management and treatment of respiratory conditions [8,9].

This study capitalizes on the effectiveness of the VGG16 neural network, renowned for its straightforward architecture and potent efficiency in extracting features from visual data. In this work, we modify the VGG16 network to specifically target and identify various lung anomalies, effectively addressing significant deficiencies observed in current diagnostic methodologies. Moreover, the study emphasizes the critical necessity for model generalizability across diverse clinical environments and varying patient demographics. To this end, sophisticated data augmentation techniques have been implemented. Particularly, the use of CycleGAN to augment a radiological dataset is a notable strategy, as it not only expands the diversity of data available for model training but also substantially enhances the robustness and practical utility of the model. By generating synthetic X-ray images that mimic a variety of pathological features, the model’s capacity to perform consistently and accurately across different clinical settings is expected to improve, thereby bridging crucial gaps in diagnostic capabilities and advancing the field of medical imaging.

The foremost contributions of this study are as follows:

We propose a diagnostic model that builds upon the established VGG16 neural network. By incorporating custom blocks into the VGG16 network, the model is significantly enhanced, tailoring it to meet the requirements of lung anomaly detection. This adaptation not only leverages the strengths of the VGG16 architecture in feature extraction but also addresses specific challenges encountered in pulmonary imaging, thereby substantially improving the system’s diagnostic capabilities.
To fortify the resilience of our diagnostic model against imbalances, we employ the CycleGAN strategy. This approach generates synthetic images that mimic underrepresented conditions within the training dataset, creating a more balanced and comprehensive dataset for training the model. The use of CycleGAN helps overcome the limitations posed by imbalanced datasets and enhances the model’s adaptability and accuracy across various clinical settings. These practical enhancements are instrumental in developing more dependable and efficient diagnostic tools, ultimately improving patient care and treatment outcomes.

The organization of this study is structured as follows: Section 2 reviews the pertinent literature in the field, setting the stage for the following discussions. Section 3 delves into the comprehensive methodology adopted in this study, providing details on the data collection process, the algorithms used, the strategies for data augmentation, and the metrics used to assess the effectiveness of our approach. Section 4 presents the experimental analysis and results. Section 5 examines the optimization techniques utilized throughout our research. The study concludes with Section 6, summarizing our findings and outlining the future research.

2. Related Work

In recent years, the application of AI techniques for detecting anomalies in chest X-ray images has gained considerable momentum [10,11,12]. This growing interest is largely due to the extraordinary ability of these algorithms to recognize complex patterns and inconsistencies that may escape conventional analytical methods [13,14]. The use of DL in medical research has become markedly prevalent, especially in aiding the diagnosis of diverse health circumstances [15,16]. Many analyses that have utilized DL in medical diagnostics have shown encouraging outcomes, confirming both the precision and effectiveness of these technologies [17,18,19]. This section explores the methodologies adopted by previous investigators in this field.

Kumara et al. [20] utilized three different pre-trained CNNs—Xception, InceptionV3, and MobileNetV2—through the application of transfer learning to enhance the diagnostic capabilities for X-ray imaging. This method involves using models that were initially trained on large, diverse collections and adapting them to the specific task of identifying abnormalities in medical images. Among these, Xception outperformed the others, achieving the highest testing accuracy of 94.13%. InceptionV3 followed this with an accuracy of 91.75%, while MobileNetV2 demonstrated an accuracy of 88.75%. In [21], the researchers conducted a comprehensive examination of the capabilities of the ResNet architecture, explicitly analyzing the performance of ResNet-50, ResNet-101, and ResNet-152 models in the diagnosis of various lung diseases using a dataset of 21,885 chest X-ray images. These images were categorized into four groups: COVID-19, pneumonia, lung opacity, and normal cases, providing a broad spectrum for assessing the diagnostic efficiency of each ResNet model. The study aimed to evaluate the potential of these DL architectures to identify and differentiate between these critical respiratory conditions accurately. The study’s results specifically showed that the ResNet-50 model achieved an accuracy of 91%, while the more complex ResNet-101 model showed a slightly better performance with a 93% accuracy rate. The most advanced model, ResNet-152, demonstrated the highest accuracy at 94%.

In the study [22], the researchers introduced a DL multi-category model designed to differentiate among viral pneumonia, COVID-19, normal lung conditions, and lung opacity using chest X-ray images. This work focused on evaluating the efficacy of the MobileNetV2 network, both in its original configuration and a modified version tailored specifically for this application. Exploring these models underscored the potential and versatility of MobileNetV2 in medical imaging contexts. Notably, the modified version of MobileNetV2, which was adapted with enhancements to better suit the specific demands of pulmonary diagnosis, significantly outperformed the standard model. It achieved an impressive accuracy of 95.80%, compared to the standard MobileNetV2 model, which achieved an accuracy of 90.47%. This marked improvement in the modified model’s performance illustrates the value of customizing neural network architectures to meet specific diagnostic challenges, thereby enhancing their effectiveness and reliability in clinical settings. In [23], the researchers explored DL methodologies for four categories of chest X-ray images, including COVID-19, normal lung conditions, lung opacity, and viral pneumonia. The initial phase of their approach involved extracting features from images using a combination of local binary patterns (LBP) and CNN, specifically employing the EfficientNet-b0 and an Ensemble-CNN model. Following the feature extraction, the study implemented various machine learning classifiers to evaluate the efficacy of these extracted features in categorizing the images into four predetermined categories. The classifiers used included a support vector machine (SVM), pattern recognition network (PRN), random forest (RF), decision tree (DT), and k-nearest neighbors (KNN). Among all the techniques tested, combining Ensemble-CNN for feature extraction followed by SVM for categorization emerged as the most effective, achieving a remarkable accuracy of 97.41%. In the analysis [24], a DL multi-category model was developed to accurately distinguish between viral pneumonia, COVID-19, normal lung conditions, and lung opacity from chest X-ray images. The work specifically focused on a modified version of the DenseNet121 architecture, which is renowned for its efficiency in feature utilization and reduction of computational burden through feature reuse. The modified DenseNet121 model achieved an overall accuracy of 97.67% and an F1-score of 97.64%.

In the work [25], the authors propose a DL strategy that employs CNNs to enhance the detection of COVID-19 using chest X-ray images. This methodology is part of a broader effort to harness the capabilities of AI in improving diagnostic processes, particularly in the rapid and accurate identification of respiratory conditions associated with the COVID-19 virus. The study’s findings are a testament to the effectiveness of this approach, achieving an impressive 97.00% accuracy rate in diagnosing three specific pulmonary conditions. In a recent study [26], the researchers developed a methodology that harnesses the combined strengths of the VGG16 model and GANs to enhance the diagnostic capabilities for pulmonary conditions using chest X-ray images. This approach utilizes a dataset that includes COVID-19 cases and images of healthy lung conditions, providing a broad spectrum of data for training and testing the models. By integrating GANs with the VGG16 architecture, the study aimed to improve the model’s ability to discern nuanced features in X-ray images indicative of various lung conditions, thereby increasing diagnostic accuracy. The VGG16 model enhanced with GAN technology achieved an accuracy of 96.55%, surpassing the performance of the standard VGG16 model, which recorded an accuracy of 94.74%. The success of this methodology suggests that the synergistic use of GANs can significantly augment the predictive accuracy of established models like VGG16, offering a promising avenue for future research and application in medical diagnostics.

We have developed a diagnostic model that leverages the foundational strengths of the well-known VGG16 neural network, renowned for its effectiveness in image recognition tasks. To elevate its capabilities specifically for medical imaging, we have integrated custom blocks into the VGG16 architecture, significantly enhancing its analytical prowess. These customizations are designed to optimize the network’s ability to process and interpret complex medical imagery, thereby increasing the accuracy and reliability of diagnoses. To address and mitigate issues related to data imbalance—a common challenge in medical image analysis that can skew the performance of DL models—we have incorporated the CycleGAN strategy into our approach. This technique to synthesize medical images enriched our collection and provided a more balanced and representative training environment for the model. This enhancement is crucial for ensuring that our diagnostic model performs consistently well across various categories of pulmonary diseases, even those that are less represented in available collections. Table 1 summarizes an overview of the recent efforts in pulmonary diagnosis identification.

3. Methodology

3.1. Collection of Radiographs

In the present study, we utilized the COVID-19 Radiography Database [27], one of the most extensive publicly available collections of chest X-ray images. This dataset comprises a total of 21,165 chest X-ray images, formatted in PNG and with a resolution of 299 × 299 pixels. This collection contains 3616 images categorized as COVID-19-positive cases, alongside 10,192 images classified as normal. Additionally, the dataset includes 6012 images identified as lung opacity (Non-COVID-19 lung infection) and 1345 images diagnosed with viral pneumonia. This diverse compilation of X-ray images enables a comprehensive analysis and testing of diagnostic models across a spectrum of pulmonary conditions. Figure 1 provides a visual representation of the sample images from each category within the COVID-19 Radiography Database, illustrating the variety of conditions that our study addresses. The distribution of pulmonary conditions within the collection employed is depicted in Figure 2.

3.2. Splitting the Collection

The collection was randomly partitioned into three distinct subsets: training, validation, and testing. Specifically, 60% of the collection, comprising 12,699 images, was allocated to the training set, while the remaining 40% was divided between the validation and testing sets, each receiving 20% (4233 images). The training set includes 3607 images of lung opacity, 2170 of COVID-19, 6115 classified as normal, and 807 representing viral pneumonia. Similarly, the validation set comprises 4233 images, broken down into 1221 lung opacity, 737 COVID-19, 1996 normal, and 279 viral pneumonia images. The test set contained 1184 lung opacity, 709 COVID-19, 2081 normal, and 259 viral pneumonia images. Notably, the images designated for the test set were excluded from the training phase to prevent data leakage and ensure the integrity of the testing process. The detailed distribution of these sets is presented in Table 2.

3.3. Image Preprocessing

This study employs image preprocessing to ensure each chest X-ray image is prepared for subsequent analysis. Initially, all images are resized to a uniform dimension of 224 × 224 pixels. This resizing is crucial to maintain consistency across all images, ensuring that no critical diagnostic features are lost or obscured due to variations in original image sizes. Following resizing, we normalize the pixel values of all images to a range between 0 and 1. This step is essential for ensuring the image data are standardized, providing a consistent basis for accurate feature extraction and analysis. These preprocessing steps are vital for preparing the images for subsequent analysis, ensuring they are optimally configured for the DL model employed in this study [28].

3.4. Application of CycleGAN in Image Data Augmentation

This study utilizes the CycleGAN [29,30] architecture as a data augmentation strategy. This technique is particularly suited to the challenges presented by medical imaging collections, which often suffer from small size and unbalance. CycleGAN helps to address these limitations by generating synthetic yet realistic medical images that enhance the diversity and volume of the training data [31,32]. Specifically, CycleGAN augments our collection by generating synthetic images that replicate various pathological conditions from normal chest X-rays. This approach significantly enhances the diversity and volume of our training collection, providing a richer basis for training our diagnostic models without the need for additional real images, which are often scarce. Therefore, 6115 images are in every category in the training collection, such as normal conditions. The synthetic images generated by CycleGAN are integrated into the training process to ensure that our models are trained in a broader array of data and better adapted to identify the subtle features indicative of different pulmonary conditions. Figure 3 provides samples of the CycleGAN model from the training collection.

3.5. Proposed Neural Network

In this work, we present a modification to the established VGG16 neural network architecture to improve its ability to extract deeper features from medical imaging data. The VGG16 [33] model features a total of 16 layers that play an active role in processing, of which 13 are convolutional layers, and the remaining 3 are fully connected layers at the end of the network. Each convolutional layer in VGG16 utilizes 3 × 3 convolutional filters with a stride of 1 pixel, which allows the network to capture detailed and complex features at various levels of abstraction. The small filter size facilitates a deeper network architecture without a rapid reduction in the size of feature maps, ensuring that higher-level features retain a considerable amount of spatial information from the input images. Between the convolutional layers, VGG16 incorporates layers of max pooling with a 2 × 2 pixel window and a stride of 2, which reduces the spatial dimensions of the output from the previous layers, thereby decreasing the number of parameters and computational load in the network. This pooling step helps in extracting robust features that are invariant to minor changes and distortions in the input data. Activation functions are crucial for introducing non-linear properties to the network and for learning complex patterns. VGG16 employs the rectified linear unit (ReLU) activation function after each convolutional layer to achieve this non-linearity, allowing the model to be more effective during training by alleviating issues related to vanishing gradients. The network has three fully connected layers that integrate the high-level features extracted by the convolutional layers into even more abstract representations. The final layer uses a softmax activation function, which outputs a probability distribution across the target classes [34]. The network structure diagram of standard VGG16 is displayed in Figure 4a.

The modifications we suggest focus primarily on enhancing the max-pooling output layers by introducing a new set of layers. These layers include (a) group normalization, (b) average pooling, and (c) Gaussian dropout. These enhancements are designed to refine the network’s ability to process and learn from complex input features more effectively.

One of the primary benefits of implementing group normalization (GN) in medical imaging applications is its robustness to batch size variations. Unlike batch normalization (BN), which relies on the statistical properties of a batch of data for normalization, GN operates independently of batch size, normalizing data across groups of channels within each sample. This feature is crucial for maintaining the integrity of small-scale features through per-channel normalization, which adapts sensitively to the unique characteristics of medical images. Additionally, GN enhances the stability of neural networks by reducing internal shifts, leading to faster convergence and more reliable training outcomes. It also supports higher learning rates and training regimes without the divergence risks commonly associated with other normalization methods, significantly speeding up the training process—a critical advantage in medical research and practice where time is of the essence [35].

Average pooling (AP) plays a vital role in medical image analysis by capturing underlying patterns without overemphasizing noise, which is essential for accurate diagnostics and treatment planning. It reduces the spatial dimensions of input feature maps, summarizing critical information while discarding irrelevant details. In contrast to max pooling (MP), which focuses on the brightest or highest values, AP averages all values within a pooling window, thereby reducing overfitting through a more generalized representation of input features. AP also enhances model robustness by creating translation invariance, ensuring minor shifts or changes in the image do not compromise the analysis. Furthermore, it reduces computational costs by decreasing the number of parameters, condensing data into a more manageable form while retaining essential information, thus facilitating faster processing and lower memory usage [36].

Gaussian dropout (GD) extends traditional dropout by introducing multiplicative Gaussian noise to the activations of a network during training rather than merely zeroing out random neurons. This approach helps create a robust model by effectively augmenting the collection with slightly altered versions of training data in each iteration, enriching the training process without expanding the collection. Additionally, GD enhances the model’s ability to handle real-world variations in medical images, preparing it to interpret better new images that may not align perfectly with the controlled conditions of the training data [37]. The overview of enhancements in modified VGG16 is presented in Table 3, the network structure diagram of modified VGG16 is displayed in Figure 4b, and the detailed architecture with enhancements for medical image processing is shown in Table 4.

4. Implementation and Results of Experiments

4.1. Experiment Setup and Evaluation Metrics

The experimental setup for our study was executed on a computer system operating under Windows 11 Pro, equipped with 32 GB of RAM to ensure sufficient processing capacity for the demands of algorithms. We utilized an NVIDIA RTX 3060 GPU for graphical processing, which boasts 12 GB of onboard memory. Our implementation utilized the Python 3.6.0 version programming language for software. We employed Keras as our primary neural network library and TensorFlow as the backend framework. In terms of model training, the CycleGAN model underwent 300 epochs of training to ensure adequate learning and adaptation to the input data, while the CNNs were trained for 30 epochs. The Adam optimizer was selected to optimize both models for their adaptive learning rate capabilities, enhancing network weight update efficiency [38,39]. For training dynamics, a mini-batch size of 16 was chosen [40]. This size balances computational demand and the need for stable convergence properties, allowing the model to benefit from the stochastic gradient descent approach while mitigating the risk of unstable training dynamics. The learning rate was set to 0.0001 for gradual convergence, which is crucial for achieving reliable training outcomes. Cross-entropy served as the loss function, an effective choice for classification problems due to its ability to quantify the difference between the predicted probabilities and the actual categorical distribution [41].

To assess the efficacy of the proposed methodology, we employed performance metrics that are instrumental in evaluating the accuracy and robustness of categorization models [42]. These metrics included accuracy, which measures the overall correctness of the model across all categories, providing a straightforward indicator of model performance in terms of its ability to predict the label for a given input correctly. Precision, another critical metric, evaluates the model’s performance in terms of the proportion of true positive predictions in relation to the total positive predictions made, highlighting the model’s ability to minimize false positives. Recall, or sensitivity, gauges the network’s capacity to correctly categorize all actual instances of a particular category, which is crucial in applications where missing a positive instance can have severe consequences. The F1-score, which is the harmonic mean of precision and recall, offers a balanced measure of a model’s precision and recall, providing a single score that reflects both the accuracy and completeness of the model’s predictions. Additionally, the confusion matrix shows the successes and specific areas where the model may confuse one category for another. This matrix is precious for visualizing the performance of a model across different categories and for identifying trends or biases in misclassifications, which can inform further refinements to the model architecture or training process.

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(1)

Precision = \frac{T P}{T P + F P}

(2)

Recall = \frac{T P}{T P + F N}

(3)

F 1 -score = \frac{2 * T P}{2 * T P + F P + F N}

(4)

where TP = true positive, TN = true negative, FP = false positive, and FN = false negative.

4.2. Performance of Categorization Models

To assess our methodology, we employed a test collection consisting entirely of images that had not been previously used in training. This approach ensures that the evaluation of both the standard and modified VGG16 models is conducted unbiased and objectively, accurately reflecting their ability to generalize to new, unseen data. Using completely unseen images, we effectively mimic real-world scenarios where the models must operate on data they have not previously encountered. Such an evaluation not only confirms the robustness of the model’s predictive accuracy but also underscores their potential applicability in clinical environments where the ability to interpret unfamiliar and variable medical images accurately is paramount.

As shown in Table 5, the performance metrics from the test collection for both the standard and modified VGG16 networks provide significant insights into their diagnostic abilities. The standard VGG16 network achieved an accuracy of 97.61% on the test collection, which indicates a high level of correctness in its predictions across various conditions. The precision of the model, standing at 97.37%, reflects its effectiveness in identifying true positive cases, suggesting that when it predicts a condition, it is usually correct. Furthermore, the recall rate of 97.93% highlights the network’s capacity to identify most of the actual positive cases, which is crucial in medical diagnostic settings to avoid overlooking conditions that require intervention. The F1-score, which combines precision and recall into a single measure, was 97.64%, indicating a balanced accuracy in terms of both identifying conditions correctly and not missing significant cases. In comparison, the modified VGG16 model exhibits superior performance across all metrics on the test collection, underscoring the benefits of the modifications implemented. It achieved an accuracy of 98.58%, precision of 98.74%, recall of 98.77%, and an F1-score of 98.76%. These enhancements suggest that the modifications to the VGG16 network have substantially improved its diagnostic precision and reliability. The increased accuracy and precision indicate a more refined ability to categorize images correctly with fewer errors, while the elevated recall and F1-score imply improved comprehensive detection capabilities and a balanced sensitivity–specificity trade-off.

Table 6 provides a comprehensive classification report for both the standard and modified versions of the VGG16 model. Including precision, recall, and F1-score across various diagnostic categories—COVID-19, normal, opacity, and viral—allows for evaluating each model’s performance in a medical diagnostic context. For the category of COVID-19, the modified model shows a higher precision and recall compared to the standard model. This suggests that the modifications to the VGG16 architecture have remarkably enhanced its ability to accurately identify and confirm cases of COVID-19, minimizing false negatives, which are particularly critical in managing the pandemic effectively. In diagnosing normal conditions, both models perform well, but the modified model demonstrates slight improvements in all metrics. This indicates enhanced capability in differentiating normal anatomical structures from pathological changes, which is crucial in reducing unnecessary medical interventions for healthy patients. The opacity category, often challenging due to the subtle radiographic signs that must be distinguished from similar conditions, shows a noticeable improvement in the modified model. The increase in precision and recall suggests that the modifications may include better feature extraction capabilities that help distinguish between opacity and other conditions more effectively. Viral conditions, including less common and more diverse pathologies, also see improvement in the modified model, particularly in precision and recall.

In Table 7, for the standard VGG16 model, the confusion matrix indicates a robust performance, particularly in identifying normal lung conditions with a significant number of true positives (2034 out of 2081 normal cases). However, it exhibits certain weaknesses, such as misclassifying normal cases as opacity and viral and a less pronounced ability to distinguish COVID-19 from normal, as evident from misclassifications. On the other hand, the modified VGG16 model, as shown in Table 7, demonstrates a significant improvement in the overall accuracy of classifications. It notably enhances the detection of normal cases, increasing true positives to 2057 and reducing false positives compared to the original VGG16 model. These improvements are directly attributed to the modifications in the network architecture, which have enhanced the model’s feature extraction layers. Importantly, the modified model also exhibits a decrease in cross-condition misclassifications among opacity and viral conditions, indicating a refined sensitivity to the unique characteristics of these conditions.

In Figure 5a, for the standard VGG16 model, the ROC curves exhibit high AUC curve across all categories, indicating strong discriminative ability. Specifically, the model shows exceptional performance in detecting COVID-19, with an AUC close to 0.9911, underscoring its capability to identify COVID-19-positive cases with high accuracy while maintaining a low rate of false positives. The performance in other categories like opacity, normal, and viral also demonstrates high AUC values, suggesting that the network effectively distinguishes between these conditions and healthy or different pathological states. Comparatively, the modified VGG16 model, as shown in Figure 5b, enhances these metrics further, as evidenced by its ROC curves. The AUC for COVID-19 reaches an impressive 0.9979, reflecting the model’s enhanced sensitivity and specificity—traits that are critical in a clinical setting, especially for conditions with significant health implications like COVID-19. Similarly, the AUCs for normal, opacity, and viral conditions are notably higher than those of the standard model, which can be attributed to refined model architecture, improved training algorithms, or more sophisticated data handling and processing techniques. These improvements suggest that the modifications to the VGG16 network have effectively addressed limitations in the original model’s ability to differentiate between subtle radiographic features of various lung conditions.

Table 8 depicts the statistical analysis on p-value and t-test. The p-value (0.0542) for standard VGG16 is slightly above the threshold of 0.05, indicating that the performance improvement observed is not statistically significant. For the modified VGG16, the p-value (0.0426) is below the 0.05 threshold, suggesting that the improvements observed with this model are statistically significant. This indicates that the modifications applied to the VGG16 model have enhanced its performance. The standard model’s t-test (1.4758) value suggests a moderate difference from the modified model. The t-test (1.0214) value for the modified model is lower than that of the standard model, which is interesting considering its significant p-value. This discrepancy suggests that although effective, the modification provides a statistical improvement but is not excessive in magnitude.

As illustrated in Figure 6, we present three X-ray scans where the model’s predictions were incorrect. The first two scans were incorrectly categorized as having opacity, with confidence scores of 57.38% and 63.72%, respectively, despite being clinically normal. The third image, which shows characteristics typical of opacity, was inaccurately categorized as normal with a confidence of 58.84%. These examples were selected to showcase the variability in the model’s performance, particularly in borderline cases that may feature characteristics of opacities or normal but do not meet the clinical criteria for such a diagnosis.

5. Discussion

Integrating generative AI, particularly technologies like CycleGAN, into medical imaging represents a transformative advance in the field of healthcare diagnostics. This integration addresses critical challenges such as data scarcity and the need for high-quality, diverse datasets, enabling the creation of robust and precise diagnostic tools. CycleGAN has shown remarkable potential in augmenting medical imaging collections, which is particularly useful in environments where annotated medical data are scarce or incomplete. By generating synthetic yet realistic medical images from existing datasets, CycleGAN helps create robust models capable of understanding and interpreting a wide range of pathological features. This approach enhances the diversity of training data. It improves the model’s ability to generalize across different imaging conditions without the need for extensive data collection efforts, which are often costly and time-consuming. The application of generative AI in medical imaging extends beyond data augmentation. These technologies can also simulate various disease progressions or model how diseases might appear under different imaging modalities, which is invaluable for training and testing diagnostic algorithms under controlled yet realistic conditions. This not only aids in early diagnosis but also enhances the accuracy with which these diagnoses are made, potentially leading to more effective treatment plans tailored to individual patients.

This study delves into the performance and utility of CycleGAN, standard VGG16, and modified VGG16 models in medical imaging, primarily focusing on differentiating and categorizing various pulmonary conditions. The standard VGG16 model has established itself as a powerful tool for image categorization due to its deep architecture that effectively captures complex features in image data. The standard VGG16 demonstrated high accuracy in our tests identifying various lung conditions, making it a reliable choice for initial diagnostic assessments. However, its commendable performance suggested room for improvement, particularly in minimizing false positives and enhancing the detection sensitivity for less pronounced pathological features. Therefore, the proposed alterations primarily aim to improve the max-pooling output layers by incorporating a series of layers. These additions consist of group normalization, average pooling, and Gaussian dropout. This has led to a superior performance across all metrics, including precision, accuracy, recall, and F1-score, compared to its standard counterpart. These improvements notably enhance the network’s ability to discern subtle distinctions between different pathological states, reducing the rate of misdiagnoses and increasing the network’s reliability in clinical settings.

6. Conclusions and Future Work

Our study has demonstrated that integrating advanced models, specifically the standard and modified VGG16 architectures, along with generative techniques like CycleGAN, can significantly enhance the accuracy and reliability of medical imaging diagnostics. The enhanced performance of the modified VGG16 model, in particular, underscores the potential of targeted algorithmic refinements to address specific clinical needs, such as improved precision in detecting various pulmonary conditions. Furthermore, using CycleGAN for data augmentation highlights an approach to overcoming the perennial challenge of limited data availability in the medical field, especially for rare or nuanced pathological features. The standard VGG16 network reached an accuracy of 97.61% on the test collection; the modified VGG16 model exhibits a superior performance accuracy of 98.58%. Furthermore, our research aims to refine the training processes and architectures specific to CycleGAN to increase the realism and diagnostic utility of the synthetic images it generates. Such enhancements would make CycleGAN a more powerful tool for training diagnostic models, especially in regions with limited access to diverse medical imaging data.

Author Contributions

Project administration, T.S.; investigation, T.S.; software, T.S.; resources, T.S.; methodology, T.S.; formal analysis, T.S.; writing—original draft preparation, T.S.; visualization, T.S., M.V.S. and A.S.; conceptualization, T.S., M.V.S. and A.S.; validation, T.S., M.V.S. and A.S.; writing—review and editing, T.S., M.V.S. and A.S.; supervision, M.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This article uses the COVID-19 Radiography Database, which is fully available in [27].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
GAN	Generative adversarial network
DL	Deep learning
SVM	Support vector machine
PRN	Pattern recognition network
RF	Random forest
DT	Decision tree
KNN	K-nearest neighbors
GN	Group normalization
CNN	Convolutional neural network
BN	Batch normalization
VGG	Visual geometry group
ReLU	Rectified linear unit
GD	Gaussian dropout
AUC	Area under the curve
AP	Average pooling
ROC	Receiver operating characteristic

References

Puram, V.V.; Sethi, A.; Epstein, O.; Ghannam, M.; Brown, K.; Ashe, J.; Berry, B. Central apnea in patients with COVID-19 infection. J 2023, 6, 164–171. [Google Scholar] [CrossRef]
Tran, V.; Barrington, G.; Aandahl, Z.; Lawrence, A.; Wijewardena, S.; Doyle, B.; Cooley, L. Evaluation of the Abbott Panbio^TM COVID-19 Ag Rapid Antigen Test for Asymptomatic Patients during the Omicron Wave. J 2023, 6, 198–206. [Google Scholar] [CrossRef]
Althenayan, A.S.; AlSalamah, S.A.; Aly, S.; Nouh, T.; Mahboub, B.; Salameh, L.; Alkubeyyer, M.; Mirza, A. COVID-19 Hierarchical Classification Using a Deep Learning Multi-Modal. Sensors 2024, 24, 2641. [Google Scholar] [CrossRef]
Sanida, T.; Dasygenis, M. A novel lightweight CNN for chest X-ray-based lung disease identification on heterogeneous embedded system. Appl. Intell. 2024, 54, 1–25. [Google Scholar] [CrossRef]
Mustafa, Z.; Nsour, H. Using Computer Vision Techniques to Automatically Detect Abnormalities in Chest X-rays. Diagnostics 2023, 13, 2979. [Google Scholar] [CrossRef]
Sanida, T.; Tabakis, I.M.; Sanida, M.V.; Sideris, A.; Dasygenis, M. A Robust Hybrid Deep Convolutional Neural Network for COVID-19 Disease Identification from Chest X-ray Images. Information 2023, 14, 310. [Google Scholar] [CrossRef]
Louati, H.; Louati, A.; Lahyani, R.; Kariri, E.; Albanyan, A. Advancing Sustainable COVID-19 Diagnosis: Integrating Artificial Intelligence with Bioinformatics in Chest X-ray Analysis. Information 2024, 15, 189. [Google Scholar] [CrossRef]
Lambert, B.; Forbes, F.; Doyle, S.; Dehaene, H.; Dojat, M. Trustworthy clinical AI solutions: A unified review of uncertainty quantification in deep learning models for medical image analysis. Artif. Intell. Med. 2024, 150, 102830. [Google Scholar] [CrossRef]
Zhou, J.; Zhou, L.; Wang, D.; Xu, X.; Li, H.; Chu, Y.; Han, W.; Gao, X. Personalized and privacy-preserving federated heterogeneous medical image analysis with PPPML-HMI. Comput. Biol. Med. 2024, 169, 107861. [Google Scholar] [CrossRef]
Kim, T.H.; Krichen, M.; Ojo, S.; Alamro, M.A.; Sampedro, G.A. TSSG-CNN: A Tuberculosis Semantic Segmentation-Guided Model for Detecting and Diagnosis Using the Adaptive Convolutional Neural Network. Diagnostics 2024, 14, 1174. [Google Scholar] [CrossRef]
Hung-Nguyen, M. Patch-Level Feature Selection for Thoracic Disease Classification by Chest X-ray Images Using Information Bottleneck. Bioengineering 2024, 11, 316. [Google Scholar] [CrossRef] [PubMed]
Pradeep Dalvi, P.; Reddy Edla, D.; Purushothama, B.; Dharavath, R. COVID-19 detection from Chest X-ray images using a novel lightweight hybrid CNN architecture. Multimed. Tools Appl. 2024, 83, 1–23. [Google Scholar] [CrossRef]
Mohan, G.; Subashini, M.M.; Balan, S.; Singh, S. A multiclass deep learning algorithm for healthy lung, Covid-19 and pneumonia disease detection from chest X-ray images. Discov. Artif. Intell. 2024, 4, 20. [Google Scholar] [CrossRef]
Zhang, H.; Lv, Z.; Liu, S.; Sang, Z.; Zhang, Z. Cn2a-capsnet: A capsule network and CNN-attention based method for COVID-19 chest X-ray image diagnosis. Discov. Appl. Sci. 2024, 6, 1–18. [Google Scholar] [CrossRef]
Prince, R.; Niu, Z.; Khan, Z.Y.; Emmanuel, M.; Patrick, N. COVID-19 detection from chest X-ray images using CLAHE-YCrCb, LBP, and machine learning algorithms. BMC Bioinform. 2024, 25, 28. [Google Scholar] [CrossRef]
Wajgi, R.; Yenurkar, G.; Nyangaresi, V.O.; Wanjari, B.; Verma, S.; Deshmukh, A.; Mallewar, S. Optimized tuberculosis classification system for chest X-ray images: Fusing hyperparameter tuning with transfer learning approaches. Eng. Rep. 2024, e12906. [Google Scholar] [CrossRef]
Pan, C.T.; Kumar, R.; Wen, Z.H.; Wang, C.H.; Chang, C.Y.; Shiue, Y.L. Improving Respiratory Infection Diagnosis with Deep Learning and Combinatorial Fusion: A Two-Stage Approach Using Chest X-ray Imaging. Diagnostics 2024, 14, 500. [Google Scholar] [CrossRef] [PubMed]
Koyyada, S.P.; Singh, T.P. A Systematic Survey of Automatic Detection of Lung Diseases from Chest X-ray Images: COVID-19, Pneumonia, and Tuberculosis. SN Comput. Sci. 2024, 5, 229. [Google Scholar] [CrossRef]
Abdullah, M.; berhe Abrha, F.; Kedir, B.; Tagesse, T.T. A Hybrid Deep Learning CNN model for COVID-19 detection from chest X-rays. Heliyon 2024, 10, e26938. [Google Scholar] [CrossRef]
Kumara, C.T.; Pushpakumari, S.C.; Udhyani, A.J.; Aashiq, M.; Rajendran, H.; Kumara, C.W. Image Enhancement CNN Approach to COVID-19 Detection Using Chest X-ray Images. Eng. Proc. 2023, 55, 45. [Google Scholar] [CrossRef]
Hasanah, S.A.; Pravitasari, A.A.; Abdullah, A.S.; Yulita, I.N.; Asnawi, M.H. A deep learning review of resnet architecture for lung disease Identification in CXR Image. Appl. Sci. 2023, 13, 13111. [Google Scholar] [CrossRef]
Sanida, T.; Sideris, A.; Tsiktsiris, D.; Dasygenis, M. Lightweight neural network for COVID-19 detection from chest X-ray images implemented on an embedded system. Technologies 2022, 10, 37. [Google Scholar] [CrossRef]
Azad, A.K.; Ahmed, I.; Ahmed, M.U. In Search of an Efficient and Reliable Deep Learning Model for Identification of COVID-19 Infection from Chest X-ray Images. Diagnostics 2023, 13, 574. [Google Scholar] [CrossRef] [PubMed]
Sanida, T.; Sideris, A.; Chatzisavvas, A.; Dossis, M.; Dasygenis, M. Radiography Images with Transfer Learning on Embedded System. In Proceedings of the IEEE 2022 7th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), Ioannina, Greece, 23–25 September 2022; pp. 1–4. [Google Scholar] [CrossRef]
Mathesul, S.; Swain, D.; Satapathy, S.K.; Rambhad, A.; Acharya, B.; Gerogiannis, V.C.; Kanavos, A. COVID-19 Detection from Chest X-ray Images Based on Deep Learning Techniques. Algorithms 2023, 16, 494. [Google Scholar] [CrossRef]
Khalif, K.M.N.K.; Chaw Seng, W.; Gegov, A.; Bakar, A.S.A.; Shahrul, N.A. Integrated Generative Adversarial Networks and Deep Convolutional Neural Networks for Image Data Classification: A Case Study for COVID-19. Information 2024, 15, 58. [Google Scholar] [CrossRef]
Kaggle. COVID-19 Radiography Dataset. Available online: https://www.kaggle.com/tawsifurrahman/covid19-radiography-database/activity (accessed on 20 May 2024).
Kshatri, S.S.; Singh, D. Convolutional neural network in medical image analysis: A review. Arch. Comput. Methods Eng. 2023, 30, 2793–2810. [Google Scholar] [CrossRef]
Bandi, A.; Adapa, P.V.S.R.; Kuchi, Y.E.V.P.K. The power of generative ai: A review of requirements, models, input–output formats, evaluation metrics, and challenges. Future Internet 2023, 15, 260. [Google Scholar] [CrossRef]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Jaipur, India, 18–21 December 2017; pp. 2223–2232. [Google Scholar]
Sanida, M.V.; Sanida, T.; Sideris, A.; Dasygenis, M. An advanced deep learning framework for multi-class diagnosis from chest X-ray images. J 2024, 7, 48–71. [Google Scholar] [CrossRef]
Vallejo-Cendrero, D.; Molina-Maza, J.M.; Rodriguez-Gonzalez, B.; Viar-Hernandez, D.; Rodriguez-Vila, B.; Soto-Pérez-Olivares, J.; Moujir-López, J.; Suevos-Ballesteros, C.; Blázquez-Sánchez, J.; Acosta-Batlle, J.; et al. CycleGAN-Driven MR-Based Pseudo-CT Synthesis for Knee Imaging Studies. Appl. Sci. 2024, 14, 4655. [Google Scholar] [CrossRef]
Bakasa, W.; Viriri, S. Vgg16 feature extractor with extreme gradient boost classifier for pancreas cancer prediction. J. Imaging 2023, 9, 138. [Google Scholar] [CrossRef]
Alshehri, A.; AlSaeed, D. Breast cancer diagnosis in thermography using pre-trained vgg16 with deep attention mechanisms. Symmetry 2023, 15, 582. [Google Scholar] [CrossRef]
Dhanalakshmi, A.; Nagarajan, G. Group-normalized deep CNN-based in-loop filter for HEVC scalable extension. Signal Image Video Process. 2022, 16, 437–445. [Google Scholar] [CrossRef]
Akhtar, N.; Ragavendran, U. Interpretation of intelligence in CNN-pooling processes: A methodological survey. Neural Comput. Appl. 2020, 32, 879–898. [Google Scholar] [CrossRef]
Karthik, R.; Menaka, R.; Kathiresan, G.; Anirudh, M.; Nagharjun, M. Gaussian dropout based stacked ensemble CNN for classification of breast tumor in ultrasound images. IRBM 2022, 43, 715–733. [Google Scholar] [CrossRef]
Sanida, T.; Sanida, M.V.; Sideris, A.; Dossis, M.; Dasygenis, M. Efficient Categorization of Pneumonia Diagnosis Using Low-Power Embedded Devices. In Proceedings of the 2023 8th IEEE South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), Piraeus, Greece, 10–12 November 2023; pp. 1–4. [Google Scholar] [CrossRef]
Sanida, T.; Tsiktsiris, D.; Sideris, A.; Dasygenis, M. A heterogeneous implementation for plant disease identification using deep learning. Multimed. Tools Appl. 2022, 81, 15041–15059. [Google Scholar] [CrossRef]
Sanida, T.; Sideris, A.; Sanida, M.V.; Dossis, M.; Dasygenis, M. An Efficiency CNN Solution for Olive Disease Management Through FPGA. In Proceedings of the 2023 8th IEEE South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), Piraeus, Greece, 10–12 November 2023; pp. 1–4. [Google Scholar] [CrossRef]
Sanida, T.; Sideris, A.; Sanida, M.V.; Dasygenis, M. Tomato leaf disease identification via two–stage transfer learning approach. Smart Agric. Technol. 2023, 5, 100275. [Google Scholar] [CrossRef]
Chicco, D.; Jurman, G. The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. 2023, 16, 4. [Google Scholar] [CrossRef]

Figure 1. Examples of chest X-ray images of the COVID-19 Radiography Database.

Figure 2. The distribution of pulmonary conditions in the collection [27].

Figure 3. Samples of medical images synthesized by the CycleGAN model.

Figure 4. Network structure diagram of (a) standard VGG16 and (b) modified VGG16.

Figure 5. The ROC curves for the (a) standard VGG16 and (b) the modified VGG16.

Figure 6. Cases where the model fails to predict accurately.

Table 1. A review of studies on the identification of pulmonary diagnoses.

Study	Categories	Collection Size	Training/Testing Split	Best CNN	Accuracy (%)
[20]	4	21,165 (used 5380)	85:15	Xception	94.13
[21]	4	21,165	80:20	ResNet-152	94.00
[22]	4	21,165	90:10	Modified MobileNetV2	95.80
[23]	4	21,165 (used 5360)	80:20	Ensemble-CNN and SVM	97.41
[24]	4	21,165	90:10	DenseNet121	97.67
[25]	3	6432	75:25	CNN	97.00
[26]	2	94	80:20	VGG16 and GANs	96.55

Table 2. The collection summary used in our work.

Category	Training	Validation	Test
Lung Opacity	3607	1221	1184
COVID-19	2170	737	709
Normal	6115	1996	2081
Viral Pneumonia	807	279	259
Total	12,699	4233	4233

Table 3. Overview of enhancements in modified VGG16.

Enhancement	Function	Benefits	Role in Network
Group normalization (GN)	Normalizes feature maps across groups of channels.	Enhances stability with variable batch sizes and preserves small-scale features.	Reduces internal covariate shifts and supports higher learning rates without risk of divergence.
Average pooling (AP)	Reduces spatial dimensions by averaging over window areas.	Reduces risk of overfitting and provides a more generalized feature representation.	Enhances model robustness by creating translation invariance and reduces computational load.
Gaussian dropout (GD)	Introduces multiplicative Gaussian noise during training.	Augments the training collection with variations, enhances generalization.	Increases robustness against overfitting and prepares network for real-world variations in medical images.

Table 4. Detailed architecture of modified VGG16 with enhancements for medical image processing.

Layer (Type)	Output Shape	Param #
input_layer (InputLayer)	(None, 224, 224, 3)	0
conv2d (Conv2D)	(None, 224, 224, 64)	1792
activation (Activation)	(None, 224, 224, 64)	0
conv2d_1 (Conv2D)	(None, 224, 224, 64)	36,928
activation_1 (Activation)	(None, 224, 224, 64)	0
group_normalization (GroupNormalization)	(None, 224, 224, 64)	128
average_pooling2d (AveragePooling2D)	(None, 112, 112, 64)	0
gaussian_dropout (GaussianDropout)	(None, 112, 112, 64)	0
conv2d_2 (Conv2D)	(None, 112, 112, 128)	73,856
activation_2 (Activation)	(None, 112, 112, 128)	0
conv2d_3 (Conv2D)	(None, 112, 112, 128)	147,584
activation_3 (Activation)	(None, 112, 112, 128)	0
group_normalization_1 (GroupNormalization)	(None, 112, 112, 128)	256
average_pooling2d_1 (AveragePooling2D)	(None, 56, 56, 128)	0
gaussian_dropout_1 (GaussianDropout)	(None, 56, 56, 128)	0
conv2d_4 (Conv2D)	(None, 56, 56, 256)	295,168
activation_4 (Activation)	(None, 56, 56, 256)	0
conv2d_5 (Conv2D)	(None, 56, 56, 256)	590,080
activation_5 (Activation)	(None, 56, 56, 256)	0
conv2d_6 (Conv2D)	(None, 56, 56, 256)	590,080
activation_6 (Activation)	(None, 56, 56, 256)	0
group_normalization_2 (GroupNormalization)	(None, 56, 56, 256)	512
average_pooling2d_2 (AveragePooling2D)	(None, 28, 28, 256)	0
gaussian_dropout_2 (GaussianDropout)	(None, 28, 28, 256)	0
conv2d_7 (Conv2D)	(None, 28, 28, 512)	1,180,160
activation_7 (Activation)	(None, 28, 28, 512)	0
conv2d_8 (Conv2D)	(None, 28, 28, 512)	2,359,808
activation_8 (Activation)	(None, 28, 28, 512)	0
conv2d_9 (Conv2D)	(None, 28, 28, 512)	2,359,808
activation_9 (Activation)	(None, 28, 28, 512)	0
group_normalization_3 (GroupNormalization)	(None, 28, 28, 512)	1024
average_pooling2d_3 (AveragePooling2D)	(None, 14, 14, 512)	0
gaussian_dropout_3 (GaussianDropout)	(None, 14, 14, 512)	0
conv2d_10 (Conv2D)	(None, 14, 14, 512)	2,359,808
activation_10 (Activation)	(None, 14, 14, 512)	0
conv2d_11 (Conv2D)	(None, 14, 14, 512)	2,359,808
activation_11 (Activation)	(None, 14, 14, 512)	0
conv2d_12 (Conv2D)	(None, 14, 14, 512)	2,359,808
activation_12 (Activation)	(None, 14, 14, 512)	0
group_normalization_4 (GroupNormalization)	(None, 14, 14, 512)	1024
average_pooling2d_4 (AveragePooling2D)	(None, 7, 7, 512)	0
gaussian_dropout_4 (GaussianDropout)	(None, 7, 7, 512)	0
dense (Dense)	(None, 7, 7, 128)	65,664
activation_13 (Activation)	(None, 7, 7, 128)	0
dense_1 (Dense)	(None, 7, 7, 64)	8256
activation_14 (Activation)	(None, 7, 7, 64)	0
dense_2 (Dense)	(None, 7, 7, 4)	260
Total params:		14,791,812
Trainable params:		14,791,812
Non-trainable params:		0

Table 5. Performance metrics for standard and modified VGG16 models.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Standard VGG16	97.61	97.37	97.93	97.64
Modified VGG16	98.58	98.74	98.77	98.76

Table 6. Classification reports for standard and modified VGG16 models.

Category	Model	Precision	Recall	F1-Score
COVID-19	Standard VGG16	0.9817	0.9859	0.9838
	Modified VGG16	0.9930	0.9972	0.9951
Normal	Standard VGG16	0.9779	0.9774	0.9776
	Modified VGG16	0.9837	0.9885	0.9861
Opacity	Standard VGG16	0.9728	0.9654	0.9691
	Modified VGG16	0.9846	0.9730	0.9788
Viral	Standard VGG16	0.9624	0.9884	0.9752
	Modified VGG16	0.9885	0.9923	0.9904

Table 7. Confusion matrices for standard and modified VGG16 models.

Category	Standard VGG16				Modified VGG16
Category	COVID-19	Normal	Opacity	Viral	COVID-19	Normal	Opacity	Viral
COVID-19	699	6	4	0	707	2	0	0
Normal	9	2034	28	10	3	2057	18	3
Opacity	4	37	1143	0	2	30	1152	0
Viral	0	3	0	256	0	2	0	257

Table 8. Statistical analysis on p-value and t-test.

Tests	Standard VGG16	Modified VGG16
p-value	0.0542	0.0426
t-test	1.4758	1.0214

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sanida, T.; Sanida, M.V.; Sideris, A.; Dasygenis, M. Enhancing Pulmonary Diagnosis in Chest X-rays through Generative AI Techniques. J 2024, 7, 302-318. https://doi.org/10.3390/j7030017

AMA Style

Sanida T, Sanida MV, Sideris A, Dasygenis M. Enhancing Pulmonary Diagnosis in Chest X-rays through Generative AI Techniques. J. 2024; 7(3):302-318. https://doi.org/10.3390/j7030017

Chicago/Turabian Style

Sanida, Theodora, Maria Vasiliki Sanida, Argyrios Sideris, and Minas Dasygenis. 2024. "Enhancing Pulmonary Diagnosis in Chest X-rays through Generative AI Techniques" J 7, no. 3: 302-318. https://doi.org/10.3390/j7030017

APA Style

Sanida, T., Sanida, M. V., Sideris, A., & Dasygenis, M. (2024). Enhancing Pulmonary Diagnosis in Chest X-rays through Generative AI Techniques. J, 7(3), 302-318. https://doi.org/10.3390/j7030017

Article Menu

Enhancing Pulmonary Diagnosis in Chest X-rays through Generative AI Techniques

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Collection of Radiographs

3.2. Splitting the Collection

3.3. Image Preprocessing

3.4. Application of CycleGAN in Image Data Augmentation

3.5. Proposed Neural Network

4. Implementation and Results of Experiments

4.1. Experiment Setup and Evaluation Metrics

4.2. Performance of Categorization Models

5. Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI