Detection and Severity Classification of COVID-19 in CT Images Using Deep Learning

Detecting COVID-19 at an early stage is essential to reduce the mortality risk of the patients. In this study, a cascaded system is proposed to segment the lung, detect, localize, and quantify COVID-19 infections from computed tomography images. An extensive set of experiments were performed using Encoder–Decoder Convolutional Neural Networks (ED-CNNs), UNet, and Feature Pyramid Network (FPN), with different backbone (encoder) structures using the variants of DenseNet and ResNet. The conducted experiments for lung region segmentation showed a Dice Similarity Coefficient (DSC) of 97.19% and Intersection over Union (IoU) of 95.10% using U-Net model with the DenseNet 161 encoder. Furthermore, the proposed system achieved an elegant performance for COVID-19 infection segmentation with a DSC of 94.13% and IoU of 91.85% using the FPN with DenseNet201 encoder. The proposed system can reliably localize infections of various shapes and sizes, especially small infection regions, which are rarely considered in recent studies. Moreover, the proposed system achieved high COVID-19 detection performance with 99.64% sensitivity and 98.72% specificity. Finally, the system was able to discriminate between different severity levels of COVID-19 infection over a dataset of 1110 subjects with sensitivity values of 98.3%, 71.2%, 77.8%, and 100% for mild, moderate, severe, and critical, respectively.


Introduction
The coronavirus disease 2019 (COVID-19) has become a global pandemic, which affects different aspects of human life. Until 11 January 2020, more than 88.8 million confirmed cases and 1.92 million death cases have been recorded and its infection rate is still rapidly increasing worldwide [1]. Several laboratory identification tools are used for the detection of COVID-19, such as real-time reverse transcription-polymerase chain reaction (RT-PCR) and isothermal nucleic acid amplification technology [2,3]. Currently, RT-PCR is considered the gold standard to detect COVID-19 [4]. However, a high false alarm rate usually occurs due to the sample contamination, damage, or virus mutations in the COVID-19 genome. Medical imaging can be considered a first-line investigation tool [5]. Several studies [6,7] suggested performing chest computerized tomography (CT) image as a secondary test if the suspected patients show symptoms after a negative RT-PCR finding. For instance, in Wuhan, China, among 1014 COVID-19 patients, 59% had positive RT-PCR results, but 88% had positive CT scans. Besides, among the positive RT-PCR results, the CT scans achieved a 97% sensitivity [8]. Thus, CT scans can detect COVID-19 with higher accuracy than RT-PCR. Moreover, CT images can show early lesions in the lung and they can be used for the diagnosis by radiologists. However, radiologists require significant diagnostic experience to distinguish COVID-19 from other types of pneumonia [9]. Radiologists need to carry out two tasks for COVID-19 patients which are identification and severity quantification. The purpose of identification is to identify COVID-19 patients among other patients to isolate them as early as possible. Severity quantification can help medical personnel prioritize the patients who will require emergency medical care. It requires a high evaluation time for radiologists to carry out both tasks. Thus, developing artificial intelligence (AI)-based solutions specific to identification and severity quantification of COVID-19 can offer a fast, efficient, and reliable alternative that can supplement conventional medical diagnostic strategies. Recent studies showed that state-of-the-art deep convolutional neural networks (CNNs) can achieve or exceed the performance of medical experts in numerous medical image diagnosis tasks, such as skin lesion classification [10], brain tumor detection [11], breast cancer detection [12], and lung pathology screening [13,14].

Related Work
In general, COVID-19 recognition from other types of pneumonia has a unique difficulty compared to other lung diseases, such as tuberculosis screening, lung nodule detection, and lung cancer diagnosis. This difficulty arises from the high similarity between different types of pneumonia (especially in the early stage) and large variations in various stages of the same type. Powered by large annotated datasets and modern graphical processing units (GPUs), machine learning especially deep learning techniques, have achieved outbreak performance in several computer vision applications, such as image classification, object detection, and image segmentation. Recently, deep learning techniques on chest CT scans and chest X-ray (CXR) images have had increased popularity for diagnosing different lung diseases, showing promising results in various applications. Several studies have been published on CT-based COVID-19 diagnosis systems, using machine learning models [15][16][17][18][19]. Several representative studies are summarized and reviewed below. Harmon et al. [20] trained and evaluated a series of deep learning networks on a diverse multi-national cohort of 922 COVID-19 cases and 1695 non-COVID patients to localize lung parenchyma, followed by identification of COVID-19 pneumonia. AH-Net was utilized for lung volume segmentation, achieving a dice similarity coefficient (DSC) of 95%, while 3D-Densnet-121 was employed to recognize lung regions as COVID-19 or non-COVID. The average score of multiple lung regions was utilized for the classification scheme, achieving 88.9% accuracy, 85.3% sensitivity, and 90.1% specificity. Wang et al. [21] introduced a deep regression framework for automatic pneumonia identification by jointly learning from CT scan images and clinical information (i.e., age, gender, and clinical complaints). Recurrent Neural Network (RNN) with ResNet50 as the backbone was used to extract visual features from CT images. The initial clinical information collected from admitted patients (fever, cough, trouble in breathing, etc.) was analyzed by a Long short-term memory (LSTM) network and concatenated with demographic features (age and gender) and extracted visual features from CT images. Finally, a regression framework was utilized to diagnose the suspected patient as Community-acquired pneumonia (CAP) or normal. The proposed framework was evaluated over 900 clinical cases (450 CAP and 450 normal), achieving accuracy, sensitivity, specificity, and F1-Score of 0.946, 0.942, 0.949, and 0.944, respectively. In a similar approach, Mei et al. [22] proposed a joint AI algorithm to combine chest CT findings with clinical data (symptoms, exposure history, and laboratory testing) to diagnose COVID-19 from non-COVID patients using a dataset of 905 cases. The joint model achieved high discriminative performance with 0.92 area under the curve (AUC), 84.3% sensitivity, and 82.8% specificity, outperforming a senior radiologist who achieved 0.84 AUC, 74.6% sensitivity, and 93.8% specificity. The drawback of combined systems is the availability of clinical information, especially when a large number of suspected patients are waiting to be diagnosed. Furthermore, the proposed studies do not show the infection location in the lung which can be useful for the medical personnel for longitudinal monitoring of the patients.
The aforementioned machine learning solutions with CT imaging were limited to only COVID-19 detection. However, COVID-19 pneumonia screening is important for evaluating the status of the patient and treatment. In particular, COVID-19 related infection localization and the segmentation of pneumonia lesions is a crucial task for accurate diagnosis and follow-up of pneumonia patients. Zhou et al. [23] proposed a lesion detection system that can quantify COVID-19 infection regions from the chest CT scans. Three independent two-dimensional (2D) U-Nets are used for x-y, y-z, and x-z views of CT scan, where for each model, five adjacent slices are used as an input, while the network outputs infection prediction mask for the middle slice. The three intermediate binary predictions are aggregated by a simple sum up, with a threshold value of 2 to detect infection pixels. Moreover, to alleviate the data scarcity for annotated infection masks, a dynamic model was developed for data augmentation by simulating the progression of infection regions using multiple CT scan readings from the same patient. With the augmented data, the proposed system showed a performance of 78.3% DSC and 77.6% sensitivity. Besides, deep learning has a high potential to automate the lesion detection task, but requires a large set of high-quality annotations that are difficult to collect during the current pandemic. Learning from noisy training labels that are easier to generate has the potential to alleviate this problem. Wang et al. [24] introduced a novel framework to learn from noisy COVID-19 infection masks. They first proposed a new Dice loss metric, which integrates Dice loss and Mean Absolute Error (MAE). Then, a novel COVID-19 pneumonia lesion segmentation network (COPLE-Net) was developed that can segment COVID-19 infected regions with various scales and appearances. Moreover, an adaptive self-ensembling training strategy was proposed, which outperforms standard deep learning training strategies in scenarios of learning from noisy segmentation labels. The proposed framework achieved promising segmentation results with Dice, Relative Volume Error (RVE), and 95 percentile of Hausdorff Distance (HD-95) of 80.72%, 15.96%, and 17.12 ± 29.35 mm, respectively. HD-95 is the distance between the segmentation results and the ground truth in 3D space. Wang et al. [25] proposed a weakly supervised deep learning framework for COVID-19 classification and infection localization. A three-stage framework was introduced, where, first, the lung regions were segmented using a pre-trained 2D U-Net model slice by slice; then, a proposed deep convolutional neural network (DeCoVNet) was used to classify the entire 3D lung volume to COVID-19 or non-COVID. Finally, COVID-19 lesions were localized by integrating the activation regions in the classification network obtained by gradient class activation map (Grad-CAM), with activation maps from the lung segmentation model obtained by a 3D connected components method (3DCC). Therefore, the proposed algorithm does not require annotated infection masks in the training phase, as no dedicated segmentation model is used for infection localization, while ground-truth masks were provided by a professional radiologist for test set only to evaluate the network performance. The introduced algorithm was trained and evaluated on a dataset of only 313 COVID-19, and 229 non-COVID cases achieving classification results of 0.959 AUC value, and 0.976 precision-recall (PR)-AUC. However, a poor infection segmentation performance was reported with a 68.5% of hit rate (HR). Authors in [26] proposed a system that carried out lung and lesion segmentation for CT images using DRUNET, which provides a DSC value of 95.9% for lung segmentation. On the other hand, lesion segmentation network based on DeepLabv3 scored a mean DSC (mDSC) of 58.7% [27], the network was trained using 4695 CT slice for lung and lesion segmentation. From the previous studies, the performance indicators for lesion segmentation are lower than lung segmentation. Thus, a room for improvement is available. Authors in [23] required a large annotated dataset to increase the performance as only 201 scans were used in the reported results. This is also a problem in other reported research articles [25,26]. Furthermore, authors in [23,25] were dealing with images in a 3D environment, which will increase the computing cost and increase the evaluation time.
For severity quantification, several studies recommended that using deep learning can help in the quantification of COVID-19 lung opacification. Moreover, it can eliminate the subjectivity in the initial assessment for COVID-19 patients. Chaganti et al. [28] presented a method that automatically segments and quantifies abnormal CT patterns in COVID-19 patients. The proposed system utilized 9749 chest CT volume and segmented lesions, lungs, and lobe areas and used four matrices for severity quantification: percentage of opacity, percentage of high opacity, lung severity score, and lung high opacity score. Despite the good performance, no clear evaluation metric for segmentation network models was presented. Another work classified the severity into four classes (mild, moderate, severe, and critical) [16]. Lung and lesion segmentation were carried out using the UNet model via commercial tools with a median DSC of 0.85% for both models. Shen et al. [29] created a system that considers computer and radiologist evaluations to determine the COVID-19 patient severity. The computer approach consisted of four phases: segmentation of the lung and lobes, segmentation of the pulmonary vessels, filter out pulmonary vessels from the lung region, and detection of infection. The lesion segmentation was done using thresholds and adaptive region growing. The work showed that the Pearson correlation between computer and radiologist evaluation ranged from 0.7679 to 0.8373. The work was carried out using 44 patients only. Thus, a small sample can lead to the bias of computer approach evaluation. Pu et al. [30] created an automated system to quantify COVID-19 severity and progression using chest CT images; 120 patients were used to train and evaluate two U-Net models for lung and vessel segmentation. The proposed system achieved 95% and 81% DSC for lung and lesion segmentation, respectively. It is notable that the model failed to deal with pneumonic regions that are very small and near the vessels. Besides, the work used small datasets for training and testing, and a total of 192 CT volumes were used in this work.
Although most of the reviewed studies showed good performance for both lung and infection segmentation tasks, they mainly used conventional U-Net architecture or other techniques that are based on image processing. However, recently, different variants of U-Net architecture and other encoder-decoder (E-D) CNN, such as feature pyramid network (FPN), with residual, dense blocks, or inception blocks have shown state-of-the-art segmentation results in various applications. Therefore, there is still room to investigate the capability of those architectures for lung detection and COVID-19 infection localization tasks. Besides, several studies used a small number of patients and CT images to train, test, and validate the proposed systems. Table 1 summarizes the results of segmentation and classification obtained by the recent studies in the literature, and the table highlights the dataset size and the main networks used in each study.

Motivation
Although the above studies have demonstrated some promising results by using chest CT for the diagnosis of COVID-19, there is room for improvement, particularly in lesion segmentation and severity detection. Several works addressed lung and lesion segmentation, as shown in the previous section, which can help physicians to diagnose COVID-19 accurately and to assess the treatment response. The performance of the lesion segmentation models is still low compared to lung segmentation. This work aims to propose a system to identify and classify the severity of COVID-19 patients into four levels: mild, moderate, severe, and critical infection. Besides, the work investigates different deep learning methods for detecting COVID-19 infected slices from CT volume. For segmentation, U-Net [35] and feature pyramid network (FPN) [36] models were investigated with different encoders to achieve the best performance for lung and lesion segmentation. ResNet18 [37], ResNet50, ResNet152 [37], DenseNet121, DenseNet161, and DenseNet201 [38] were used as the backbone encoder for the segmentation models. Additionally, a reliable method was proposed to identify COVID-19 slices from the prediction maps generated by infection segmentation models. Besides, COVID-19 infection is quantified by computing the percentage of infected lung pixels on the segmented lung CT slices. Finally, a 3D volumetric visualization is developed to show the overall infected area in the lungs. This work uses several datasets from 1139 patients (51,027 CT slices) for training and validation and thus the system dealt with different images from different devices with varying image quality levels.
The rest of the paper is organized as follows: Section 2 describes the used methodology adopted for the study. The experimental setup and evaluation metrics are presented in Section 3. Section 4 presents the results and performs an extensive set of comparative evaluations among the networks employed, and we discuss and analyze the results. Finally, the conclusions are drawn in Section 5.

Methodology
The proposed system consists of three main stages as shown in Figure 1, where the segmentation of lung from CT images is the first step of our proposed system. Transfer learning was used on encoder layers with ImageNet weights to train the segmentation networks. The input CT volumes are evaluated slice-by-slice. First, a binary lung mask is generated for input CT slice using the 1st E-D CNN. Next, the lung is segmented using the generated mask and fed to the 2nd E-D CNN, which identifies the infected lung regions. The generated infection mask is used to distinguish between COVID-19 slices from normal slices. Besides, COVID-19 pneumonia lesions are localized using the generated lung and infection masks. The output lesion model is used to identify COVID-19 slices from normal slices. Furthermore, the infection percentage in the lung is found for the patient, to classify the severity of the given volume into four classes based on the infection percentage of the lung. Finally, a visualization tool is used to visualize the infection areas within the patients' lungs. This section first presents the datasets used in this study. Then, we shall introduce the pre-processing techniques applied to these datasets, different machine learning models investigated, and the quantification technique of COVID-19 infection.

CT Datasets
To train and evaluate the proposed system, four public datasets from different sources were used in this work ( Table 2). A total of 1139 patients and 51,027 CT slices were used in this work. The description of the used datasets is below: The first dataset [39] consists of CT volumes from 20 patients including 3520 CT images with ground truth lung masks and lesion masks. All the cases contain COVID-19 infections, the infection in the lungs range from 0.01% to 59%. These images were labelled by two radiologists and verified by an experienced radiologist (5 to 10 years of experience) as mentioned in the dataset description, and the infected slices are more than 1800 slices out of 3520 slices. The second dataset is called "COVID-19 CT segmentation dataset" [40], and the dataset is based on volumetric CTs from Radiopaedia. It includes 9 patients with 829 slices along with their corresponding ground truth lung masks, which are created by expert radiologists. Another dataset was found on the Kaggle platform; it consists of 267 CT slices with their corresponding ground truth lung masks [41], the images are non-COVID cases as they were collected in 2017. Additionally, MosMedData [42] was used for external validation. The dataset was obtained between 1 March 2020 and 25 April 2020, and it provided by medical hospitals in Moscow, Russia. It includes 1110 patients, where 50 patients have been annotated by the experts to show infection areas in 784 CT slices. The dataset consists of normal (254 patients) and COVID-19 cases (856 patients), and the COVID-19 cases are split into 4 classes: CT1 (affected lung percentage 25% or below, 684 images), CT2 (from 25% to 50%, 125 patients), CT3 (from 50% to 75%, 45 patients), and CT4 (75% and above, 2 patients).
Diagnostics 2021, 11, x FOR PEER REVIEW 6 of 18 from normal slices. Furthermore, the infection percentage in the lung is found for the patient, to classify the severity of the given volume into four classes based on the infection percentage of the lung. Finally, a visualization tool is used to visualize the infection areas within the patients' lungs. This section first presents the datasets used in this study. Then, we shall introduce the pre-processing techniques applied to these datasets, different machine learning models investigated, and the quantification technique of COVID-19 infection.

CT Datasets
To train and evaluate the proposed system, four public datasets from different sources were used in this work (Table 2). A total of 1139 patients and 51,027 CT slices were used in this work. The description of the used datasets is below: The first dataset [39] consists of CT volumes from 20 patients including 3520 CT images with ground truth lung masks and lesion masks. All the cases contain COVID-19 infections, the infection in the lungs range from 0.01% to 59%. These images were labelled by two radiologists and verified by an experienced radiologist (5 to 10 years of experience) as mentioned in the dataset description, and the infected slices are more than 1800 slices out of 3520 slices. The second dataset is called "COVID-19 CT segmentation dataset" [40], and the dataset is based on volumetric CTs from Radiopaedia. It includes 9 patients with 829 slices along with their corresponding ground truth lung masks, which are created by expert radiologists. Another dataset was found on the Kaggle platform; it consists of 267 CT slices with their corresponding ground truth lung masks [41], the images are non-COVID cases as they were collected in 2017. Additionally, MosMedData [42] was used for external validation. The dataset was obtained between 1 March 2020 and 25 April 2020, and it provided by medical hospitals in Moscow, Russia. It includes 1110 patients, where 50 patients have been annotated by the experts to show infection areas in 784 CT slices. The dataset consists of normal (254 patients) and COVID-19 cases (856 patients), and the COVID-19 cases are split into 4 classes: CT1 (affected lung percentage 25% or below, 684 images), CT2 (from 25% to 50%, 125 patients), CT3 (from 50% to 75%, 45 patients), and CT4 (75% and above, 2 patients).

Pre-Processing
Three out of four datasets provided the CT images in Neuroimaging Informatics Technology Initiative (NIfTI) format. However, different window levels (in Hounsfield units (HU)) were specified for the different datasets. This creates the following issue: the image features are not consistent across different datasets. Therefore, all NIfTI files were converted into Portable Network Graphics (PNG) format images, and the image intensity values have been normalized and mapped to pixel values in the range of 0-255; then, intensity interval has been changed for each dataset to create consistent image content. Finally, all images were resized to 256 × 256 for the segmentation tasks. Figure 2 shows the sample images in each dataset. units (HU)) were specified for the different datasets. This creates the following issue: the image features are not consistent across different datasets. Therefore, all NIfTI files were converted into Portable Network Graphics (PNG) format images, and the image intensity values have been normalized and mapped to pixel values in the range of 0-255; then, intensity interval has been changed for each dataset to create consistent image content. Finally, all images were resized to 256 × 256 for the segmentation tasks. Figure 2 shows the sample images in each dataset.

Network Models for Lung and Lesion Segmentation and Classification
Firstly, a deep learning model is developed to generate a lung mask for the input CT slice. The segmented lung is then fed to another deep learning model to identify the infection regions within the segmented CT image. The produced infection mask is used to detect COVID slices. Furthermore, the COVID-19 infection lesion is quantified by computing the percentage of infected lung pixels and visualized on the 3D volumetric model.
Lung parenchyma and COVID-19 infections segmentation were performed on CT slices using the state-of-the-art deep Encoder-Decoder Convolutional Neural Networks (E-D CNNs), U-Net, and FPN, with different backbone (encoder) models using the variants of DenseNet and ResNet. Several variants of the two backbone models were considered, starting from shallow to deep structures: ResNet18, ResNet50, ResNet152, DenseNet121, DenseNet161, and DenseNet201. The utilized encoder-decoder architecture provides a powerful segmentation model that captures the context in the contracting path and enables precise localization by the expanding path. For U-Net architecture, 1 × 1 convolution was utilized to map the output from the last decoding block to two-channel

Network Models for Lung and Lesion Segmentation and Classification
Firstly, a deep learning model is developed to generate a lung mask for the input CT slice. The segmented lung is then fed to another deep learning model to identify the infection regions within the segmented CT image. The produced infection mask is used to detect COVID slices. Furthermore, the COVID-19 infection lesion is quantified by computing the percentage of infected lung pixels and visualized on the 3D volumetric model.
Lung parenchyma and COVID-19 infections segmentation were performed on CT slices using the state-of-the-art deep Encoder-Decoder Convolutional Neural Networks (E-D CNNs), U-Net, and FPN, with different backbone (encoder) models using the variants of DenseNet and ResNet. Several variants of the two backbone models were considered, starting from shallow to deep structures: ResNet18, ResNet50, ResNet152, DenseNet121, DenseNet161, and DenseNet201. The utilized encoder-decoder architecture provides a powerful segmentation model that captures the context in the contracting path and enables precise localization by the expanding path. For U-Net architecture, 1 × 1 convolution was utilized to map the output from the last decoding block to two-channel feature maps, where a pixel-wise SoftMax activation function is applied to map each pixel into a binary class of background or lung for Lung parenchyma segmentation task, and background or lesion for infection segmentation task.
FPN employs the encoder and decoder structure as a pyramidal hierarchy where a prediction mask is made on each spatial level of the decoder path. In the final step, predicted feature maps were up-sampled to the same size, concatenated, convolved with a 3 × 3 convolutional kernel, and SoftMax activation was applied to generate the final prediction mask. Transfer learning was utilized on the encoder side of the segmentation networks by initializing the convolutional layers with ImageNet weights. The cross-entropy (CE) loss was used as the cost function for the segmentation networks: where x k denotes the kth pixel in the predicted segmentation mask, p(x k ) denotes its SoftMax probability, y k is a binary random variable getting 1 if y k = c, otherwise 0, and c denotes the class category, i.e., c ∈ {background, lung} for the lung segmentation task, and c ∈ {background, lesion} for the infection segmentation.

The Proposed Approach for COVI-19 Detection and Severity Classification
The detection of COVID-19 was performed based on the prediction maps generated by the lesion segmentation networks. Accordingly, a CT slice was classified as COVID-19 positive if at least one pixel was predicted as COVID-19 infection, i.e., p(x k ) > 0.5, otherwise, the image was considered normal. The severity of the COVID-19 patient was classified into four classes based on lung parenchyma percentage in the patients' lungs: mild, moderate, severe, and critical infection. The Percentage of Infection (PI) was calculated as the infected areas (sum of white pixels) over the lung area for one CT slice. For the entire volume, the average of all slices was considered as the patient severity percentage. Based on the percentage, the patient was classified into four classes. Figure 3 demonstrates the process of calculation of Percentage of Infection (PI) on one CT slice. denotes the class category, i.e., ∈ { , } for the lung segmentation task, and ∈ { , } for the infection segmentation.

The Proposed Approach for COVI-19 Detection and Severity Classification
The detection of COVID-19 was performed based on the prediction maps generated by the lesion segmentation networks. Accordingly, a CT slice was classified as COVID-19 positive if at least one pixel was predicted as COVID-19 infection, i.e., ( ) > 0.5, otherwise, the image was considered normal. The severity of the COVID-19 patient was classified into four classes based on lung parenchyma percentage in the patients' lungs: mild, moderate, severe, and critical infection. The Percentage of Infection (PI) was calculated as the infected areas (sum of white pixels) over the lung area for one CT slice. For the entire volume, the average of all slices was considered as the patient severity percentage. Based on the percentage, the patient was classified into four classes. Figure 3 demonstrates the process of calculation of Percentage of Infection (PI) on one CT slice.

Experimental Setup
Classification and segmentation models were implemented using PyTorch library with Python 3.7 on Intel ® (Santa Clara,California, USA) Xeon ® CPU E5-2697 v4 @2.30 GHz and 64 GB RAM, with an 8-GB NVIDIA ® (Santa Clara,California, USA) GeForce ® GTX 1080 GPU card. Segmentation models were trained using Adam optimizer with

Experimental Setup
Classification and segmentation models were implemented using PyTorch library with Python 3.7 on Intel ® (Santa Clara, CA, USA) Xeon ® CPU E5-2697 v4 @2.30 GHz and 64 GB RAM, with an 8-GB NVIDIA ® (Santa Clara, CA, USA) GeForce ® GTX 1080 GPU card. Segmentation models were trained using Adam optimizer with learning rate, α = 10 −3 , momentum updates, β 1 = 0.9 and β 2 = 0.999, and mini-batch size of 4 images with 50 backpropagation epochs as shown in Table 3. Early stopping criterion was used as follows: when no improvement in validation loss was seen during the 10 epochs, training was stopped abruptly. Table 3 presents the training and hyper-parameters for the lung and infection segmentation models.

Data Preparation and Augmentation
Lung Segmentation networks were trained using 5-fold cross-validation (CV), with 80% train and 20% test (unseen) folds, where 20% of training data was used as a validation set to avoid overfitting. For infection segmentation, instead of 5-fold cross-validation, 10-fold cross-validation was used. Class imbalance in the dataset impacts the performance of the deep learning models. Thus, data augmentation was used to balance the size of each class in lung and lesion segmentation datasets to ensure every possible aspect of avoiding data overfitting [43]. This step is crucial for the training phase to reduce the associated error from the lung segmentation task, which might propagate to the subsequent lesion segmentation task [44]. We performed data augmentation by applying rotations of 90, −90, 180 degrees for CT images and ground truth masks. Table 4 summarizes the number of images per class used for training, validation, and testing at each fold. Independent training and evaluation were provided for both the networks, where original CT slices were used as input to the lung segmentation models, and lung segmented CT slices were used as inputs to the lesion segmentation network, where infection masks were used as groundtruth. Besides, a combined evaluation was provided using the best lung segmentation and infection segmentation models to evaluate the overall performance of the proposed cascaded system.

Evaluation Criteria
Quantitative evaluations for the proposed approach are performed for lung segmentation, infection segmentation, and COVID-19 detection tasks. The segmentation tasks were evaluated on the pixel-level, where the foreground (lung or infected region) was considered as the positive class, and background as the negative class. For the COVID-19 detection task, the performance was computed per CT sample, where slices with COVID-19 infection were considered as the positive class and normal slices were considered as the negative class.
The performance of detection and segmentation tasks was assessed using different evaluation metrics with 95% confidence intervals (CIs). Accordingly, the CI for each evaluation metric was computed as follows: where N is the number of test samples, and ‡ is the level of significance that is 1.96 for 95% CI. All values were computed over the overall confusion matrix that accumulates all test fold results of the 5-fold or 10-fold cross-validation in respective experiments. The performance of the lung and lesion segmentation networks were evaluated using three evaluation metrics which are accuracy, Intersection over Union (IoU), and Dice Similarity Coefficient (DSC): where accuracy is the ratio of the correctly classified pixels among the image pixels. TP, TN, FP, FN represent the true positive, true negative, false positive, and false negative, respectively.
Intersection over Union (IoU) = TP TP + FP + FN Dice Similarity Coe f f icient (DSC) = 2TP 2TP + FP + FN (5) where both IoU and DSC are statistical measures of spatial overlap between the binary ground-truth segmentation mask and the predicted segmentation mask, whereas the main difference is that DSC considers double weight for TP pixels (true lung/lesion predictions) compared to IoU. Five evaluation metrics were considered for the COVID-19 detection scheme: accuracy, sensitivity, precision, F1-score, and specificity.
where precision is the rate of correctly classified positive class CT samples among all the samples classified as positive samples.
where sensitivity is the rate of correctly predicted positive samples in the positive class samples.
where F1 is the harmonic average of precision and sensitivity.
Speci f icity = TN TN + FP (9) where specificity is the ratio of accurately predicted negative class samples to all negative class samples.

Results and Discussion
This section describes the results of the lung and lesion segmentation, COVID-19 detection and severity classification, along with 3D lung modeling to visualize lung infections.

Lung Segmentation
The results of 5-fold cross-validation are tabulated in Table 5. For each model, it was observed that three encoders: DenseNet, 121, 161, and 201 are the top-performing ones for lung segmentation. However, it is visible that the FPN network with different encoders did not improve the results compared with different U-Net architectures, which is a standard network for segmentation tasks. U-Net with DenseNet, 121, 161, and 201 encoders showed the best DSC performance for lung segmentation. DenseNet 121 is the best-performing network for lung segmentation with IoU and DSC of 95.35% and 97.11%, receptively. The outputs of the top three networks compared with the ground truth are shown in Figure 4. It can be observed that the segmentation of U-Net with DenseNet, 121, 161, and 201 is highly consistent with the ground truth. An interesting observation is the ability of the three networks for creating the segmentation mask for the small lung region. This is considered a challenging task for the deep learners as shown in rows 2 and 3 in Figure 4, where it was shown that the network can generate a mask for the small lung slices. Although the lungs can be severely affected by COVID-19 lesions, the trained model successfully segmented the lung boundaries, as shown in Figure 4. This reflects the robustness of the model proposed in this study for lung segmentation. Authors in [22,45] discarded the small lung area (less than 20% of the body part) slices during the pre-processing phase. However, this work included such images in the training and testing sets.

Lesion Segmentation
Better the infection in the lung. The segmentation performances of the different networks are presented in Table 6. The results indicate that the FPN network performs better than the UNet in general. DenseNet201 FPN achieved the best segmentation performance with IoU, and DSC of 91.85% and 94.13%, respectively. The second and third best networks were FPN models also, but the results are very close with insignificant differences. Figure 5 shows the ability of the top three networks to segment the infected regions even from small lung areas ( Figure 6).

Lesion Segmentation
Better the infection in the lung. The segmentation performances of the different networks are presented in Table 6. The results indicate that the FPN network performs better than the UNet in general. DenseNet201 FPN achieved the best segmentation performance with IoU, and DSC of 91.85% and 94.13%, respectively. The second and third best networks were FPN models also, but the results are very close with insignificant differences. Figure 5 shows the ability of the top three networks to segment the infected regions even from small lung areas ( Figure 6).

COVID-19 Detection
The performance of lesion segmentation networks from the CT lung images is presented in Table 7. Since missing any COVID-19 positive case is critical, sensitivity is the primary metric that we consider in detection. All the networks achieved high sensitivity values (>99%), where both U-Net and FPN networks with DenseNet201 as the backbone achieved the best performance with 99.64% sensitivity, which indicates that the proposed approach can achieve a high level of robustness. Moreover, the FPN model with DenseNet201 as backbone achieved the specificity of 98.72%, indicating a significantly low false alarm rate.

Severity Classification Using MosMedData Dataset
A total of 1110 patients were provided in the MosMedData Dataset, which was used to test the performance of the proposed severity classification system. Lung and lesion masks were generated using the best-performing networks obtained from previous sections. Figure 7 shows three examples of predicted masks by both models: the best lung segmentation network (DenseNet 161 UNet) and the best lesion segmentation network (DenseNet201 FPN) on an entirely independent dataset. It can be seen that the cascaded networks were able to detect the lung borders very accurately and also performed well in detecting the main COVID-19 infection regions.
to test the performance of the proposed severity classification system. Lung and lesion masks were generated using the best-performing networks obtained from previous sections. Figure 7 shows three examples of predicted masks by both models: the best lung segmentation network (DenseNet 161 UNet) and the best lesion segmentation network (DenseNet201 FPN) on an entirely independent dataset. It can be seen that the cascaded networks were able to detect the lung borders very accurately and also performed well in detecting the main COVID-19 infection regions. The infection percentage has been calculated for each CT volume, where each volume is classified as healthy (CT0), or with mild (CT1), moderate (CT2), severe (CT3), or critical (CT4) COVID-19 infection using the criteria mentioned in the MosMedData Dataset, however, quantified using our infection percentage quantification method. It should be noted that the ground truth of CT0-CT4 classification was provided in the dataset, which was done by visually inspecting the CT slices by professional radiologists. The infection percentage has been calculated for each CT volume, where each volume is classified as healthy (CT0), or with mild (CT1), moderate (CT2), severe (CT3), or critical (CT4) COVID-19 infection using the criteria mentioned in the MosMedData Dataset, however, quantified using our infection percentage quantification method. It should be noted that the ground truth of CT0-CT4 classification was provided in the dataset, which was done by visually inspecting the CT slices by professional radiologists. The confusion matrix for the classification of 1110 patients is shown in Figure 8 and the quantitative evaluation is summarized in Table 8. The confusion matrix for the classification of 1110 patients is shown in Figure 8 and the quantitative evaluation is summarized in Table 8.  From the confusion matrix, it can be observed that the system can reliably classify severe (CT4) volumes with 100% accuracy. Moreover, the majority of normal cases, CT0, were classified correctly; only 12 cases were misclassified, whereas 8 out of the 12 studies presented a low infection percentage of 2-4% by the proposed model. Despite that, the data description of MosMedData Dataset [42] mentioned that no viral pneumonia is shown in CT0 cases. Other types of lung diseases might be present in CT0 cases. Another explanation can be due to errors in the image acquisition phase. Any motion, including breathing and body movements, can lead to small artifacts on the images, which can lead to confusing the lesion model decision. Figure 8 shows that all COVID-19 cases were detected as CT1, CT2, CT3, and CT4; none of the COVID-19 cases were predicated as normal (CT0). In other words, the system can distinguish COVID-19 patients from healthy cases through the entire test set. Thus, the severity classification performance matches the results obtained in the detection section. Furthermore, the proposed system showed lower sensitivity values for moderate (CT2) and severe (CT3) compared to CT0, CT1, and CT4; this can be related to how the dataset was labeled by radiologists. The dataset was labeled by two radiologists using a visual semi-quantitative approach; such  From the confusion matrix, it can be observed that the system can reliably classify severe (CT4) volumes with 100% accuracy. Moreover, the majority of normal cases, CT0, were classified correctly; only 12 cases were misclassified, whereas 8 out of the 12 studies presented a low infection percentage of 2-4% by the proposed model. Despite that, the data description of MosMedData Dataset [42] mentioned that no viral pneumonia is shown in CT0 cases. Other types of lung diseases might be present in CT0 cases. Another explanation can be due to errors in the image acquisition phase. Any motion, including breathing and body movements, can lead to small artifacts on the images, which can lead to confusing the lesion model decision. Figure 8 shows that all COVID-19 cases were detected as CT1, CT2, CT3, and CT4; none of the COVID-19 cases were predicated as normal (CT0). In other words, the system can distinguish COVID-19 patients from healthy cases through the entire test set. Thus, the severity classification performance matches the results obtained in the detection section. Furthermore, the proposed system showed lower sensitivity values for moderate (CT2) and severe (CT3) compared to CT0, CT1, and CT4; this can be related to how the dataset was labeled by radiologists. The dataset was labeled by two radiologists using a visual semi-quantitative approach; such an approach can lead to weak labeling [45].

3D Modeling of Lung Volume with Lesion Visualization
A 3D model of the lung with infection segmentation was generated for each patient using the output of lung and lesion segmentation networks. The proposed tool can assist the medical doctors to better assess the infection and to evaluate its severity. Figure 9 shows 3D lung models from different views while the COVID-19 infection is presented with red color saturation.

3D Modeling of Lung Volume with Lesion Visualization
A 3D model of the lung with infection segmentation was generated for each patient using the output of lung and lesion segmentation networks. The proposed tool can assist the medical doctors to better assess the infection and to evaluate its severity. Figure 9 shows 3D lung models from different views while the COVID-19 infection is presented with red color saturation.

Conclusions
In this paper, we proposed a systematic approach for COVID-19 detection, lung, and lesion segmentation, and patients' severity grading from the CT images. To find the best performing deep learning models, we have investigated several state-of-the-art segmentation networks. The proposed approach with the cascaded models achieved elegant performance levels in segmentation, classification, infection quantification, and 3D visualization. The main conclusions of this study can be summarized as follows.
For lung segmentation, DenseNet 161 U-Net outperformed the FPN network with different encoders. On the other hand, FPN with DenseNet201 encoder performed the best in lesion segmentation with a DSC value of 94.13%.
The proposed lesion segmentation pipeline can generate better lung and lesion masks for small lung areas. Such images and masks are typically excluded by many studies in the literature.
The right choice of encoders can significantly boost the performance of segmentation models. This study demonstrated that the DenseNet family outperforms ResNet in lung segmentation.
The proposed approach with the FPN DenseNet201 encoder model achieved the highest sensitivity of 99.64% in COVID-19 detection performance.
The system was able to classify the severity for COVID-19 patients based on Percentage of Infection (PI) by considering the output of lung and lesion segmentation networks and was able to discriminate between different severity levels of COVID-19 infection over a dataset of 1110 subjects with sensitivity values of 98.3%, 71.2%, 77.8%, and 100% for mild, moderate, severe, and critical infections, respectively.
In summary, computer-aided detection and quantification is an accurate, easy, and

Conclusions
In this paper, we proposed a systematic approach for COVID-19 detection, lung, and lesion segmentation, and patients' severity grading from the CT images. To find the best performing deep learning models, we have investigated several state-of-the-art segmentation networks. The proposed approach with the cascaded models achieved elegant performance levels in segmentation, classification, infection quantification, and 3D visualization. The main conclusions of this study can be summarized as follows.
For lung segmentation, DenseNet 161 U-Net outperformed the FPN network with different encoders. On the other hand, FPN with DenseNet201 encoder performed the best in lesion segmentation with a DSC value of 94.13%.
The proposed lesion segmentation pipeline can generate better lung and lesion masks for small lung areas. Such images and masks are typically excluded by many studies in the literature.
The right choice of encoders can significantly boost the performance of segmentation models. This study demonstrated that the DenseNet family outperforms ResNet in lung segmentation.
The proposed approach with the FPN DenseNet201 encoder model achieved the highest sensitivity of 99.64% in COVID-19 detection performance.
The system was able to classify the severity for COVID-19 patients based on Percentage of Infection (PI) by considering the output of lung and lesion segmentation networks and was able to discriminate between different severity levels of COVID-19 infection over a dataset of 1110 subjects with sensitivity values of 98.3%, 71.2%, 77.8%, and 100% for mild, moderate, severe, and critical infections, respectively.
In summary, computer-aided detection and quantification is an accurate, easy, and feasible method to diagnose COVID-19 cases. Funding: This research was funded by Qatar University COVID19 Emergency Response Grant (QUERG-CENG-2020-1) and student grant (QUST-1-CENG-2021-7) and the claims made herein are solely the responsibility of the authors.