Fighting together against the pandemic: learning multiple models on tomography images for COVID-19 diagnosis

The great challenge for the humanity of the year 2020 is the fight against COVID-19. The whole world is making a huge effort to find an effective vaccine with purpose to protect people not yet infected. The alternative solution remains early diagnosis, carried out through real-time polymerase chain reaction (RT-PCR) test or thorax computer tomography (CT) scan images. Deep learning algorithms, specifically convolutional neural networks, represent a methodology for the image analysis. They optimize the classification design task, essential for an automatic approach on different types of images, including medical. In this paper, we adopt pretrained deep convolutional neural network architectures in order to diagnose COVID-19 disease on CT images. Our idea is inspired by what the whole of humanity is achieving, substantially the set of multiple contributions is better than the single one for the fight against the pandemic. Firstly, we adapt, and subsequently retrain, for our assumption some neural architectures adopted in other application domains. Secondly, we combine the knowledge extracted from images by neural architectures in an ensemble classification context. Experimental phase is performed on CT images dataset and results obtained show the effectiveness of the proposed approach with respect to state-of-the-art competitors.


Introduction
The proliferation of the new coronavirus, from now COVID-19, is the current threat to humanity and has spread rapidly around the world starting from January 2020. The 30th of January 2020 is a reference for history because it has been declared by the World Health Organization (WHO) as the official start of international public health emergency, better known as a pandemic. Currently, no countries is immune to the virus and, clearly, the situation appears to be critical. The virus manifests itself after 5-6 days of the onset of the disease with specific and non-specific symptoms. The first are fever, dry cough, sore throat, loss of taste or smell, while the second are fatigue, headache and breathlessness. Unfortunately, COVID-19 has also occurred in animals through transmission between them. Looking back, viruses with similar behavior are Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and the Middle East Respiratory Syndrome Coronavirus (MERS Coronavirus) with related major respiratory problems. Currently, the medical protocol takes more than 24 hours to detect the virus in the human body. It is important to detect the disease during the starting phase in order to isolate the infected person because there is no effective cure. Diagnosis can be made through real-time polymerase chain reaction (RT-PCR). RT-PCR is not very reliable due to the high false negative rates and finalization time. Otherwise, COVID-19 can be detected in healthy people due to a false positive. It is clear that the low sensitivity of RT-PCR test is not satisfactory in the current pandemic situation. In some cases, the infected are not recognized in time and do not receive adequate care. As an alternative to RT-PCR, the Thorax Computer Tomography (CT) is probably a more reliable, effective, and faster approach for virus detection and treatment. In almost all hospitals CT image screening is available and can be adopted for a first analysis of the virus. Unfortunately, the Thorax CT images requires a radiologist and a lot precious time is lost. Therefore, the automated analysis of Thorax CT images can speed up the diagnosis in order to help specialist medical staff, above all, and not only, to avoid delays in the start of treatment.
In the last few years, deep learning has proven effective for the management, analysis, representation and classification of medical images. In particular, the success of deep neural networks, applied to the image classification task, is connected to different interesting aspects such as the spread of software in terms of open source license, the constant growth of hardware power and the availability of large datasets. Specifically, for the treatment of COVID-19, deep neural networks are adopted both in segmentation and detection phases. However, uncertainty in COVID-19 diagnosis and data imbalance have a decisive impact on performance, hampering model generalization. In order to provide a solution to the above issues, we introduce a framework based on transfer deep learning and ensemble classification for COVID-19 diagnosis. It works based on three integrated stages. A first, which performs image preprocessing operations such as image resize and augmentation. A second, which redesigns and retrains multiple deep neural networks. A third, which combines different predictions provided by deep neural networks with the aim of making the best decision (COVID-19/not-COVID-19). The framework provides the following main contributions: • A deep and ensemble learning based framework, to simultaneously address variation between classes and class imbalance for COVID-19 diagnosis task.
• A framework that provides multiple classification models, based on deep transfer learning.
• The demonstration that choosing multiple models, suitably combined, is better than a single model and can strengthen the decision during the diagnosis by a specialist doctor.
• Some experimental greater improvements over existing methods on recent state of art dataset about COVID-19 detection task.
The paper is structured as follows. Section 2 provides an overview of state of art about COVID-19 classification approaches. Section 3 describes in detail proposed framework. Section 4 provides a wide experimental phase, while section 5 concludes the paper.

Related work
In this section, we briefly analyze the most important approaches working on COVID-19 diagnosis now existing in literature. In this field are included numerous works that address the task according to different aspects. Some offer important contributions about image representation, by implementing segmentation algorithms or new descriptors. Instead, others implement complex mechanisms of learning and classification.
In [1] authors propose an architecture in order to improve performance in recognizing COVID-19 from chest radiograph images. It consists of two main components: image augmentation and transfer learning. This combination improves performance measurements such as accuracy, sensitivity, specificity, precision, accuracy, and F1 score.
Authors in [2] present a multitask deep learning model to jointly identify COVID-19 patient and segment COVID-19 lesion from chest computed tomography images. The proposed architecture includes three phases: COVID-19 vs normal vs other infections classification, COVID-19 lesion segmentation, image reconstruction. Furthermore, algorithm flow is based on a common created encoder for the three tasks. It takes a CT scan as input, and its output is then adopted for image reconstruction via a first decoder, to the segmentation via a second decoder, and to the classification of COVID-19 vs normal vs other infections via multilayer perceptron.
In [3] authors build a public available dataset containing hundreds of CT scans COVID-19 positive and implement a sample-efficient deep learning approach that can obtain high diagnosis accuracy on a limited training set CT images. The approach integrates contrastive self-supervised learning with transfer learning layer, to learn powerful and unbiased features representation. Aim to reducing the risk of overfitting, a large and consistent dictionary on-the-fly based on the contrastive loss to fulfill this auxiliary task is built.
In [4] authors propose a features selection and voting classifier framework for COVID-19 CT image classification. Firstly, the features are extracted using convolutional neural network (AlexNet). Secondly, a proposed features selection algorithm, Guided Whale Optimization Algorithm (SFS-Guided WOA) based on Stochastic Fractal Search (SFS), is then applied followed by balancing algorithm. Finally, a voting approach, Guided WOA based on Particle Swarm Optimization (PSO), that aggregates different classifier such as Support Vector Machine (SVM), neural networks, K-Nearest Neighbor (KNN) and decision trees predictions, to choose the most voted class in an ensemble learning way, is adopted.
In [5] authors design a neural architecture, called CTnet-10, for the COVID-19 diagnosis from CT images. It is formed by a max-pooling layer of dimension 62 × 62 × 32 followed by 2 convolutional layers of dimensions 60 × 60 × 32, 58 × 58 × 32 respectively and a pooling layer of dimension 29 × 29 × 32. The last levels are: a flattened layer which is connected out to an fully connected layer of 4096 neurons, in which the dropout layer was used in each of these. Last layer, a single neuron sigmoid and linear, classifies CT scan images as COVID-19 positive or negative. Tests results are compared with known neural networks (DenseNet-169, VGG-16, ResNet-50, InceptionV3, and VGG-19).
In [6] authors build an open-source COVID-19 CT image dataset and a diagnosis method based on multi-task learning and self-supervised learning. To address the overfitting issue, they study two strategies: one is to add additional information including segmentation masks of lung regions and fed them into the feature extraction network.
In [7] authors does a retrospective study on chest CT scans images with purpose to find the relationship to the time between symptom onset and the initial CT scan. The hallmarks of COVID-19 infection on images are bilateral and peripheral ground-glass and consolidative pulmonary opacities. With a longer time after the onset of symptoms, CT findings are more frequent, including consolidation, bilateral and peripheral disease, greater total lung involvement, linear opacities, crazy-paving pattern, and the reverse halo sign.
In [8] authors develop an AI-based automated CT image analysis tools for detection, quantification, and tracking of COVID-19. The system utilizes robust 2D and 3D deep learning models, modifying and adapting existing AI models and combining them with clinical understanding. The first step is the lung crop stage, in which the lung region of interest is extracted using a lung segmentation module. The following step detects COVID-19 related abnormalities using deep convolutional neural network architecture. To overcome the limited amount of images dataset, data augmentation techniques (image rotations, horizontal flips and cropping) are applied.
In [9] author propose a 3D deep convolutional neural network, named DeCoVNet, to detect COVID-19 from CT volumes. DeCoVNet is composed of three blocks. First, called network stem, which consists in a vanilla 3D convolution, a batchnorm layer and a pooling layer. Second, composed of two 3D residual blocks (ResBlocks). In each ResBlock, a 3D feature map is passed into both a 3D convolution with a batchnorm layer and a shortcut connection containing a 3D convolution. The output feature maps are added in an element-wise manner. Third, a progressive classifier (ProClf), which is composed of three 3D convolution layers and a fully-connected (FC) layer. A softmax activation function progressively abstracts the information in the CT volumes by 3D max-pooling and finally directly outputs the probabilities of being COVID-19 or not.
In [10] authors investigate the diagnostic value and consistency of chest CT as comparison with RT-PCR test. For patients with multiple RT-PCR, the dynamic conversion of RT-PCR results (negative to positive, positive to negative) is analyzed as compared with serial chest CT scans for those with a time interval between RT-PCR tests of 4 days or more. Chest CT has a high sensitivity for diagnosis of COVID-19 and may be considered as a primary tool for the current detection in epidemic areas.
In [11] authors examine the sensitivity, specificity, and feasibility of chest CT in detecting COVID-19 compared with RT-PCR test. Sensitivity and specificity of chest CT in their various steps are compared using RT-PCR as a gold standard. A reverse calculation approach is applied to chest CT as a hypothetical gold standard and compared to RT-PCR to it point out the flaw of the standard approach. The study want to prove that the sensitivity and specificity of the chest CT in COVID-19 diagnosis and the radiation exposure have to be taken into account together.

Materials and Methods
In this section we introduce the proposed framework which is composed of two well known methodologies: deep neural networks and ensemble learning. The main idea is to combine several deep neural networks with purpose to classify images. The result is a set of competitive models providing a range of confidential decisions useful for making choices during classification. The framework is structured into three level. A first, which performs preprocessing in terms of image resize and augmentation. A second, which learns different deep neural networks, previously redesigned for the specific task. A third, in which different models provided by deep neural networks are combined, through ensemble rules, for classification purpose. Finally, the framework iterates through a predetermined number of times in a supervised learning context.

Image augmentation
Many approaches have been developed to address the complications associated with the limited amount of data in machine learning. Image augmentation [12] is a functional technique for increasing and/or changing the size of the training set without acquiring additional images. The concept is basic and consists of duplicating and/or modifying the images with some kind of variation so that more samples can help to train the model. As general idea, the image is augmented in a way that preserves key features for making predictions, but reworked so that the pixels present in some form of noise. The augmentation will be harmful if produces images that are very dissimilar to those used to test the model, so it is clear that this process must be organized in detail. In the proposed framework, we have adopted random reflection, translation, and scaling in order to enhance and augment the image content. As described in the experimental section, this step turns out to be fundamental in improving the performance of the proposed approach.

Image resize
One of the defects of neural networks concerns the fixed size of the images to be processed. To this end, a resize step is performed based on the input layer dimension claimed by the deep neural networks (details can be found in table 1 column 5). Most of the networks need this trick but it does not alter the image information content in any way. The size normalization is essential because images of different or large dimensions cannot be processed for the network training and classification stages.

Network design and transfer learning
The transfer learning approach has been selected for classification purpose. The basic idea is to transfer the knowledge extracted from a source domain to a destination one, in our case the COVID-19 diagnosis. Generally, a pretrained network is chosen as starting point in order to learn a new task. It is the easiest and fastest solution to adopt the representational power of pretrained deep networks. Clearly, it turns out to be much faster and easier to tune a network with transfer learning than training a new network from scratch with randomly initialized weights. For COVID-19 diagnosis, deep learning architectures are selected based on their structure and performance skills. The goal is to train networks on images by redesign their structures in the final layer according to the needs of the addressed task (two outgoing classes: COVID-19 and not-COVID-19). Table 1 supports the provided below description of adopted networks.
Resnet18 [13] is inspired by pyramidal cells contained in the cerebral cortex. It uses particular skip connections or shortcuts to jump over some layers. It is composed of 18 layers deep, which with the help of a technique known as skip connection has paved the way for residual networks.
Densenet201 [14] is a convolutional neural network with 201 layers deep. Unlike standard convolutional networks composed of L layers with L oneto-one connections between the current layers and the nexts, it contains L(L+1) 2 direct connections. Specifically, each layer adopts the feature-maps of all preceding layers and its own feature-maps into all subsequent layers as inputs.
Mobilenetv2 [15] is a convolutional neural network with 53 layers deep. It was built based on inverted residual structure with shortcut connections between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Furthermore, non-linearities in the narrow layers are removed with purpose to maintain representational power.
Shufflenet [16] is a convolutional neural network with 173 layers deep designed for mobile devices with very limited computing power. The peculiarity of this architecture concerns the introduction of pointwise group convolution and channel shuffle operations in order reduce computation cost and maintain accuracy. Deep neural networks have been adapted to COVID-19 classification problem. Originally, the main training phase is performed on the Imagenet dataset [17], which include a million images divided into 1000 classes. The result consists in a rich features representation for a wide range of images. The network processes an image and provides a prediction about a classes to which it could belong with an attached probability. Commonly, the first layer of the network is the image input layer. The input requires images with 3 color channels. Immediately after is followed by convolutional layers, which work with purpose to extract image features. Particularly, last learnable layer and the final classification layer are adopted to classify the input image. In order to make the pretrained network compliant to classify new images, the two last layers with new layers are replaced. Frequently, the last layer, with related learnable weights, is fully connected. This layer is removed and replaced by a new which is fully connected with outputs related to number of classes of new data (COVID-19 and not-COVID-19). In addition, learning phase of the new layer, compared to the transferred layers, can be speeded up by increasing the rate factors. Optionally, the weights of previous levels can be left unchanged by setting their learning rate to zero. This variation prevents the updating of the weights, during training, and a consequent lowering of the execution time as the gradients of the relative layers do not have to be calculated. This aspect has a strong impact in the case of small datasets in order to avoid overfitting.

Ensemble Learning
The contribution of different deep neural networks can be mixed in an ensemble context. Considering the set, with cardinality k, of images belonging to x classes, to be classified each element of the set will be treated with the procedure below. Let's consider the set C composed of n deep neural networks which are combined to classify the images through the set CN each β n provides a decision d ∈ I{−1, 1}, where 1 stands for not-COVID and −1 for COVID, with reference to i k ∈ Imgs. The set of decisions D can be defined as follows it should be noted that each element of the matrix D corresponds to the result of the deep neural network and image combination of the matrix CN in terms of position, such as β n i k → d βni k . Furthermore, a score value s, s ∈ S{0, . . . , 1}, is associated with each decision d and represents the posterior probability P (i|x) that an image i could belong to class x. In addition, the set of scores S can be defined as follows also in this case each element of the matrix S corresponds to the result of the deep neural network and image combination of the matrix CN with related posterior probability in terms of position, such as β n i k → d βni k → P (i k |x) d βni k . At this point, let's introduce the concept of mode, defined as the value which is repeatedly occurred in a given set where l is the lower limit of the modal class, h is the size of the class interval, f 1 is the frequency of the modal class, f 0 is the frequency of the class which precedes the modal class and f 2 is the frequency of the class which successes the modal class. The columns of matrix D are analyzed by applying the mode, in order to obtain the values of the most frequent decisions. This step is performed in order to verify the best responses of the different deep neural networks, contained in the set C. Moreover, the meaning of mode is twofold. First, the most frequent value. Second, its occurrences in terms of indices. For each most frequent occurrence, modal value, the corresponding score from the matrix S is extracted. In this regard, a new vector is generated DS = {ds P (i 1 |x) d β 1,...,n i 1 , . . . , ds P (i k |x) d β 1,...,n i k }, where each element ds contains the average of the decision scores with higher frequency, extracted through the mode, in the related column of the matrix D. Also, the modal value of each column of the matrix D is stored in the vector DM DM = {dm d β 1,...,n i 1 , . . . , dm d β 1,...,n i k }, each value dm contains the modal value of the class to which image i could belong with the average probability score ds. In essence, this is the class to which an image could belong based on the votes given by different deep neural networks.

Experimental results
This section describes the experiments performed on public dataset. In order to produce compliant performance, the settings reported in recent COVID-19 classification methods are adopted. The experimental phase is structured in two parts with purpose to address the COVID-19 detection task. The first concerns a comparison with deep neural networks, in order to prove how a multiple model can provide better guidance than a single. While, the second deals with recent methods, which adopt a different logic than the proposed approach.

Dataset
The COVID-19 CT adopted dataset is publicly available 1 and the details are described in [18]. It is composed of 746 Thorax Computer Tomography (CT) images, where 349 contain clinical findings of COVID-19 from 216 patients and 397 are obtained from non-COVID-19 patients. The CT images are a collection selected from COVID-19 related papers published in medRxiv, bioRxiv, NEJM, JAMA, Lancet, and others. The reliability of this dataset has been validated by a senior radiologist, of Tongji Hospital, Wuhan, China, that has worked on diagnosis and treatment of a large number of COVID-19 patients during the period of maximum emergency between January and April 2020.

Settings
The framework consists in different modules written in Matlab language and extends the code available in [19]. The pretrained networks were taken from ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [20]. Various combinations of networks were selected, because the test phase was split by applying and not image augmentation. The choice is not random but was made based on single network performance and an in-depth analysis of their architecture (number of layers, applications in literature, etc). A different combination did not provide expected feedback. Densenet201, Mo-bilenetv2, Resnet18 were adopted with image augmentation, while Shufflenet, Resnet18 and Mobilenetv2 were adopted without image augmentation. Table 1 shows some important details related to the adopted networks. Among all the computational stages, the training process was certainly the most expensive. As is certainly known, the networks are composed of fully connected layers that make the structure extremely dense and complex. This aspect certainly increases the computational load. In order to compare the results with those obtained in [19], networks were trained by setting the mini batch size to 5, the maximum epochs to 6, the initial learning rate to 3 · 10 −4 (constant for the training stage), momentum value to 0.9, gradient threshold method to L2 norm, factor for L2 regularization (weight decay) to 1 · 10 −4 , minimum batch size to 10 and the optimizer is stochastic gradient descent with momentum (SGDM) algorithm. 80% and 20% of images are randomly included in train and test set respectively, for a number of iteration equal to 5 with the aim of calculate the relevance feedback measures reported in table 2. For each training and before each network validation epoch the data were shuffled. The images were converted into RGB space and resized to align with the input format of each pretrained network. The training was performed with and without image augmentation. Random reflection, translation and scaling were performed for the option of image augmentation. About random reflection, each image was reflected vertically with probability equal to 0.5 in the top-bottom direction. Again, a horizontal and vertical translation to each image was applied. The translation distance was selected randomly from a continuous uniform distribution within the specified range [−30, 30]. Similarly, images were scaled vertically and horizontally by selecting in random way the scale factor from a continuous uniform distribution within the specified range [0.9, 1.1]. Table 2 summarizes the metrics adopted for the performance evaluation. The goal is to provide an uniform comparison with approaches working on the same task and to understand, from experimental phase, what information can be useful for COVID-19 diagnosis.

Relevance feedback
The Sensitivity, also known as True Positive rate, concerns the portion of images containing COVID-19 disease elements that are correctly identified. The measure provides important information because highlights the skill to detect images containing the disease and contributes to increase the degree of robustness of result. At the same, it is possible to state of the Specificity, also

Metric Equation Sensitivity
T P T P +F N Specificity T N T N +F P Accuracy T P +F N T P +F P +T N +F N F 1 2·T P 2·T P +F P +F N known as True Negative rate, which instead measures the portion of negatives, images not containing COVID-19 disease elements, that have been correctly identified. Differently, accuracy, a well-known performance measure, is the proportion of true results among the total number of cases examined. In our case provides an overall analysis, certainly a rough measurement compared to the previous ones, about the skill of a classifier to distinguish an image of patient affected to COVID-19 from an image of patient not affected to COVID-19. Furthermore, F 1 is defined as the combination of precision and recall of the model in term of harmonic mean. In addition, the AUC was calculated using the trapezoidal integration to estimate the area under the ROC curve and represents the measure of performance of a classifier. ROC is a probability curve built by showing the True Positive rate against the False Positive rate with different threshold values. The AUC value is contained in the range between 0.5 and 1, where the value 0.5 represents the performance of a random classifier and the value 1 indicates a perfect one. A high AUC value provides positive classification indications.

Discussion
Tables 3 and 4 describe the comparison with different deep neural networks respectively applying and not image augmentation. The provided performance can be considered satisfactory compared to different neural architectures. In terms of accuracy, although it provides a rough measurement, we have provided the best result with and without image augmentation. Sensitivity, a measure that provides greater confidence about addressed problem, is very high for both cases. Otherwise, Specificity, which also provides a high degree of information related to the absence of COVID-19 within the image, is the best value for both cases. Regarding the remaining measures, F 1 score and AUC, considerable values were obtained. Table 5 provides comparison results with existing COVID-19 classification methods in term of accuracy. As shown, the proposed approach is only surpassed by Ai et al. [10] and Fang et al. [11]. For the remaining methods the performance provided is better. The effectiveness of the results can be attributed to two main aspects: deep neural networks and competitive model for classification. First, the deep neural networks chosen for images learning and classification are the main strong point. Furthermore, the framework provides multiple learning models that certainly constitute a different starting point than a standard approach, in which a single model is provided. This aspect is relevant for improving performance. Second, the classification stage which provides multiple choices in decision making. In fact, at each iteration, the framework selects which networks are suitable for recognizing COVID-19 in the images on test set. Certainly, the computational load is greater but produces better results than a single classification approach. Not negligible issue concerns the image size normalization, with respect to the request of the first layer of the neural networks, before the leaning phase, which not produce a performance degradation. In other cases, degradation of image details, quality and content is due to normalization. Otherwise, the weak point is the computational load even if pretrained networks include layers with already tuned weights. Surely, the time required for training stage is long and computational resources are high but less than a network created from scratch. Moreover, the addressed binary classification has not been greatly disadvantaged by the problem of class imbalance because the relationship between the number of images per class is not very unbalanced. In many cases, data imbalance impacts a low prediction of accuracy for the minority class. It is open problem but the solutions are many such as undersampling of majority class or oversampling of minority classes using image augmentation, weighted loss method by updating the loss function to result in the same way for all classes. This behavior is often seen in medical data due to the limitations of patient samples and cost of acquiring annotated data. Furthermore, in the case of COVID-19 diagnosis it could be relevant as data relating to patients are not completely publics.

Conclusions and Future Works
The challenge in COVID-19 detection is especially interesting and, not only, when the data comes from visual information. The complexity of the task is linked to different factors such as the constant increase and variation of data, given that the challenge is in full swing. In support, the convolutional neural networks give a big hand for understand the meaning of information inside the images with the consequent goal of their classification. In this context, we have proposed a framework that combines convolutional neural networks, adapted to COVID-19 detection task, through a transfer learning approach, using an ensemble criteria. The results produced certainly support the theoretical thesis. A multiple model, based on different deep neural networks, compared to a single one is a high discrimination factor. The extensive experimental phase has shown how the proposed approach is competitive, and in some cases surpassing, with respect to state of the art methods. Certainly, the main weak point concerns the computational complexity relating to learning phase, as it is known, takes a long time especially when the data to be processed grows. Future work will certainly concern the study and analysis of convolutional neural networks still unexplored for this type of problem and the application of the proposed framework to additional datasets with purpose to definitively win the challenge to COVID-19.