Portable Chest X-ray Synthetic Image Generation for the COVID-19 Screening

The global pandemic of COVID-19 raises the importance of having fast and reliable methods to perform an early detection and to visualize the evolution of the disease in every patient, which can be assessed with chest X-ray imaging. Moreover, in order to reduce the risk of cross contamination, radiologists are asked to prioritize the use of portable chest X-ray devices that provide a lower quality and lower level of detail in comparison with the fixed machinery. In this context, computer-aided diagnosis systems are very useful. During the last years, for the case of medical imaging, they are widely developed using deep learning strategies. However, there is a lack of sufficient representative datasets of the COVID-19 affectation, which are critical for supervised learning when training deep models. In this work, we propose a fully automatic method to artificially increase the size of an original portable chest X-ray imaging dataset that was specifically designed for the COVID-19 diagnosis, which can be developed in a non-supervised manner and without requiring paired data. The results demonstrate that the method is able to perform a reliable screening despite all the problems associated with images provided by portable devices, providing an overall accuracy of 92.50%.


Introduction
COVID-19, declared as a global pandemic by the World Health Organization (WHO) in March 2020, mainly affects the respiratory tissues [1]. Chest X-ray imaging plays an important role in supporting the screening and early detection of the disease. In this context, radiologists are asked to prioritize the use of portable chest X-ray devices that are important to reduce the risk of cross contamination [2]. However, these devices provide a lower quality and a lower level of detail in comparison with fixed machinery [3]. In this critical scenario, computer-aided diagnosis (CAD) systems can be very useful for clinical practice. During recent years, in the scope of biomedical imaging, these diagnostic systems were usually developed using computer vision techniques as well as machine learning techniques and, specifically, deep learning strategies, which have increased their importance. However, in the context of supervised learning, deep learning models require a great amount of labeled data to be trained.
Regarding medical imaging, data scarcity is an aspect to take into account as, in many occasions, it critically affects the amount of labeled data. One of the ways to overcome data scarcity is to generate synthetic images with several network architectures, as is the case of many variants of Generative Adversarial Networks (GANs) [4]. One example of this kind of GAN model is the CycleGAN, a model that is able to translate images from a certain scenario to another different scenario.
In this work, due to the low availability of samples that show COVID-19 affectation, we present novel approaches to artificially increase the size of a portable chest X-ray image dataset to diagnose COVID-19, combining three different and complementary CycleGAN architectures to perform an oversampling using a non-supervised strategy that can be performed without paired data.

Methodology
Thus, the presented methodology is divided in 2 different parts. The first part performs the synthetic image generation. The second part uses the novel set of generated images in order to augment the dimensionality of the original dataset, which is proven in a COVID-19 screening scenario.

Approaches for Data Augmentation
In order to increase the size of the original chest X-ray dataset, we considered 3 different complementary scenarios, which correspond to all the possible combinations given the classes of the dataset. For the first scenario, normal vs. pathological, normal samples are translated to their pathological representation and vice versa. For the second scenario, normal vs. COVID-19, normal samples are converted to their hypothetical representation showing COVID-19 affectation and vice versa, and for the third scenario, pathological vs. COVID-19, we perform the same task as in the previous cases but to convert pathological samples to COVID-19 and vice versa. It is important to remark that all the images from the original dataset are used to train the CycleGAN model [5].

Approaches for Screening Tasks
For this second stage, we assess the degree of separability among the generated images and the suitability of the novel set of generated synthetic images, with the oversampled dataset. We used a Dense Convolutional Network Architecture (DenseNet) [6] model, pretrained on the ImageNet dataset, with the same training details as stated in [7,8] due to their suitability to this particular problem.

Results and Conclusions
The chest X-ray image dataset was provided by the Radiology Service of the Complexo Hospitalario Universitario de A Coruña (CHUAC) and is composed of 600 patients that were divided into 3 different classes [9], having 200 normal cases (i.e., from patients without evidence of pulmonary pathologies), 200 pathological cases (i.e., from patients with pulmonary pathologies other than COVID-19) and 200 COVID-19 genuine cases.
In order to demonstrate the separability and the suitability of the generated synthetic images, we conducted 4 different experiments, where the first 3 correspond to the separability among the generated images and the fourth experiment corresponds to the suitability of the novel set of generated images, evaluating the screening using the oversampled dataset. The first 3 experiments demonstrate that there is a proper separability among generated images for the 3 possible scenarios. For the fourth experiment, the model obtained a global accuracy of 0.9250 for the test. Additionally, Figure 1 shows the performance of the model for the test set for all the 4 experiments, obtaining remarkable correct classification ratios in every case.

Institutional Review Board Statement:
The study was approved by the Ethics Review Board y Data Management Technical Commission of Galician Health Ministry for High Impact studies with protocol code 2020-007.

Conflicts of Interest:
The authors declare no conflict of interest.