Chest CT Computerized Aided Quantification of PNEUMONIA Lesions in COVID-19 Infection: A Comparison among Three Commercial Software

Purpose: To compare different commercial software in the quantification of Pneumonia Lesions in COVID-19 infection and to stratify the patients based on the disease severity using on chest computed tomography (CT) images. Materials and methods: We retrospectively examined 162 patients with confirmed COVID-19 infection by reverse transcriptase-polymerase chain reaction (RT-PCR) test. All cases were evaluated separately by radiologists (visually) and by using three computer software programs: (1) Thoracic VCAR software, GE Healthcare, United States; (2) Myrian, Intrasense, France; (3) InferRead, InferVision Europe, Wiesbaden, Germany. The degree of lesions was visually scored by the radiologist using a score on 5 levels (none, mild, moderate, severe, and critic). The parameters obtained using the computer tools included healthy residual lung parenchyma, ground-glass opacity area, and consolidation volume. Intraclass coefficient (ICC), Spearman correlation analysis, and non-parametric tests were performed. Results: Thoracic VCAR software was not able to perform volumes segmentation in 26/162 (16.0%) cases, Myrian software in 12/162 (7.4%) patients while InferRead software in 61/162 (37.7%) patients. A great variability (ICC ranged for 0.17 to 0.51) was detected among the quantitative measurements of the residual healthy lung parenchyma volume, GGO, and consolidations volumes calculated by different computer tools. The overall radiological severity score was moderately correlated with the residual healthy lung parenchyma volume obtained by ThoracicVCAR or Myrian software, with the GGO area obtained by the ThoracicVCAR tool and with consolidation volume obtained by Myrian software. Quantified volumes by InferRead software had a low correlation with the overall radiological severity score. Conclusions: Computer-aided pneumonia quantification could be an easy and feasible way to stratify COVID-19 cases according to severity; however, a great variability among quantitative measurements provided by computer tools should be considered.


Introduction
The spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has already assumed pandemic proportions, affecting over 100 countries in a few weeks. A global response is needed to prepare health systems worldwide [1,2]. Covid-19 can be diagnosed both on chest X-ray and on computed tomography (CT). Asymptomatic patients may also have lung lesions on imaging. CT investigation in patients with suspicion Covid-19 pneumonia involves the use of the high-resolution technique. Artificial intelligence (AI) software for quantification of Pneumonia Lesions has been employed to facilitate CT diagnosis [3][4][5].
Several radiological organizations do not recommend CT as a primary diagnostic/screening tool for COVID-19 [6][7][8][9] or have excluded CT findings from its diagnostic criteria [10]. However, the viral pneumonia diagnosis on chest CT plays an important role in the management of patients with suspected SARS-CoV-2 infection, especially as anticipation of mild invasive ventilation has been proven effective in reducing the severity of pneumonia [11,12], in the absence of proven therapies for the treatment of COVID-19. Radiologists focus on main CT findings (GGO: ground-glass opacity and consolidation), and lesion distribution (left, right, or bilateral lungs) [10]. Bilateral distribution of GGOs, with or without consolidation, in posterior and peripheral lungs, was initially described as a characteristic feature of COVID-19 [11,12].
Machine learning-based technologies and computer tools are playing a substantial role in the COVID-19 pandemic. Experts are using machine learning to study the virus, test potential treatments, diagnose individuals, analyze the public health impacts, and more. Computer software could be useful categorizing the disease into different severities with quantitative, objective assessments of the extent of the lesions [13][14][15][16].
Computer tools have recently been proposed for the recognition of lung lesions from Covid-19 on CT examination, many of which are Chinese [17][18][19]. However, many of them are not recognized as medical devices nor do they have the CE marking. Furthermore, they have been tested on thousands of cases of COVID-19 but not equally on as many cases of non-COVID-19 coronavirus, affecting their ability to make a differential diagnosis. The recognition of interstitial pneumonia lesions on a chest CT scan does not pose any difficulties and therefore the role of computer tools remains limited to the numerical quantization of the lesions and their distribution.
Proof of principle of the diagnostic capability of deep learning methods from CT images to screen for COVID-19 has been recently demonstrated by Wang et al. [20] on 1119 CT images of pathogen-confirmed COVID-19 cases versus typical viral pneumonia. The accuracy and applicability of deep learning for screening COVID-19 from CTs have however been questioned, based on concerns of the radiologists' association and given the impact of selection bias reported in first published results.

Patient Characteristics
This retrospective study included patients enrolled by "Bergamo Est" hospital, Bergamo, by "AORN Giuseppe Moscati", Avellino and "University Vanvitelli", Napoli. In relation to the ongoing epidemic emergency, the institutional local review boards waived written informed consent for this retrospective study that evaluated de-identified data and involved no potential risk to patients. The population included 162 patients (57 women and 105 men; 67 years of median age-range, 26-93 years) subjected to the nucleic acid amplification test of the respiratory tract or blood specimens using reverse transcription real-time fluorescence polymerase chain reaction test with confirmation of COVID-19 infection, between 23 February 2020, and 31 March 2020. The virus investigation for etiological diagnosis was executed by the current gold standard test (RT-PCR).

CT Technique
Chest CT scan was performed at the time of patient admission in a hospital with three CT scanners: two scanners with 128 slices (Ingenuity of Philips, Amsterdam, Netherlands and Revolution of GE Healthcare, Chicago, IL, USA), one CT scanner with 64 slices (Toshiba Aquilion 64 Slices, Tokyo, Japan). CT examinations were performed with the patient in the supine position in breath hold during and inspiration using a standard dose protocol, without contrast intravenous injection. The scanning range was from the apex to the base of the lungs. The tube voltage and the current tube were 120 kV and 100-200 mA, respectively. All data were reconstructed with a 0.6-1.0 mm increment. The matrix was 512 mm × 512 mm. Images were reconstructed using a sharp reconstruction kernel for parenchyma and viewed at window settings optimized for the assessment of the lung parenchyma (window width: 1600 HU; window level: 600 HU).

CT Post-Processing
DICOM data were transferred into a PACS workstation and CT images were evaluated using three clinically available computer tools: Thoracic VCAR software (GE Healthcare, Chicago, Illinois, United States); Myrian software (Intrasense, France); InferRead tool (InferVision Europe, Wiesbaden, Germany). Table 1 reports a comparison among evaluated commercial software based on the provided functionalities.

Post-Processing with Thoracic VCAR Software
Thoracic VCAR software is a CE marked medical device designed to quantify pulmonary emphysema in patients with Chronic Obstructive Pulmonary Disease. The software provides automatic segmentation of the lungs and automatic segmentation and tracking of the airway tree. It provides the classification of voxels based on Hounsfield Units and a color-coded display of the thresholds within a segmented region. We analyzed the CT scans of patients with confirmed COVID-19 pneumonia by pre-setting a threshold value of Hounsfield Units to obtain a segmentation of both lungs and a quantitative evaluation of Emphysema (−1024/−977; blue), Healthy residual lung parenchyma (−976/−703; yellow), GGO (−702/−368; pink) and consolidation (−100/5; red). Finally, volumes for both the right and left lung were calculated ( Figure 1). The Myrian solution developed by Intrasense teams automatically provides an objective measurement of the impairment and the available pulmonary reserve of patients, allowing the identification of healthy and pathological areas (ground glass and consolidations areas). These elements provide the pulmonary reserve as well as a density histogram over a complete pulmonary volume. Moreover, the system automatically generates structured diagnosis reports and follow-up reports for pneumonia cases.
We analyzed using Myrian software the CT scans and registered the healthy parenchyma (−1000/−801), GGO (−800/−400), and consolidation volumes (−399/69) ( Figure 2). provides the classification of voxels based on Hounsfield Units and a color-coded display of the thresholds within a segmented region. We analyzed the CT scans of patients with confirmed COVID-19 pneumonia by pre-setting a threshold value of Hounsfield Units to obtain a segmentation of both lungs and a quantitative evaluation of Emphysema (−1024/−977; blue), Healthy residual lung parenchyma (−976/−703; yellow), GGO (−702/−368; pink) and consolidation (−100/5; red). Finally, volumes for both the right and left lung were calculated ( Figure 1).

Post-Processing with Myrian Software
The Myrian solution developed by Intrasense teams automatically provides an objective measurement of the impairment and the available pulmonary reserve of patients, allowing the identification of healthy and pathological areas (ground glass and consolidations areas). These elements provide the pulmonary reserve as well as a density histogram over a complete pulmonary volume. Moreover, the system automatically generates structured diagnosis reports and follow-up reports for pneumonia cases. We analyzed using Myrian software the CT scans and registered the healthy parenchyma (−1000/−801), GGO (−800/−400), and consolidation volumes (−399/69) ( Figure 2).  status and help clinicians arrange further exams. Moreover, the system automatically generates structured diagnosis reports and follow-up reports for pneumonia cases. We analyzed using InferRead system software the CT scans and registered the volumes with different density (Figure 3) and then we calculated the GGO volume (summing the volumes in these ranges −570/−470; −470/−370; −370/−270), consolidation area (summing the volumes in these ranges −170/−70; −70/30; 30/60) and healthy parenchyma volume (lung volume remaining).

Radiologists Analysis
Radiologists attributed for each lung looking the CT images at the pulmonary involvement by disease a severity score using a scale of 5 levels (none, mild: ≤25% of involvement, moderate: 26-50% of involvement, severe: 51-75% of involvement and critic involvement: 76-100%). Moreover, an overall radiological severity score was obtained summing the scores for each lung and then considering a mild severity a score ≤2, moderate 3-4, severe 5-6, and critic 7-8. Two radiologists with more than 10 years of thoracic-imaging analysis experience evaluated the severity of images in a double-blind manner. Another, more experienced, radiologist resolved any disagreement between the two radiologists.

Statistical Analysis
Continuous data were expressed in terms of median value and range. Mann-Whitney test and Kruskal-Wallis test were used to verify differences between groups. Spearman correlation coefficient and intraclass correlation coefficient were used to analyze the correlation and variability among quantitative measurements generated by different computer tools and between radiological severity score obtained by the radiologists and quantitative results generated by the computer software.
p-value < 0.05 was considered significant for all tests. All analyses were performed using IBM SPSS Statistics 24 (IBM, Armonk, NY, USA).

Radiologists Analysis
Radiologists attributed for each lung looking the CT images at the pulmonary involvement by disease a severity score using a scale of 5 levels (none, mild: ≤25% of involvement, moderate: 26-50% of involvement, severe: 51-75% of involvement and critic involvement: 76-100%). Moreover, an overall radiological severity score was obtained summing the scores for each lung and then considering a mild severity a score ≤2, moderate 3-4, severe 5-6, and critic 7-8. Two radiologists with more than 10 years of thoracic-imaging analysis experience evaluated the severity of images in a double-blind manner. Another, more experienced, radiologist resolved any disagreement between the two radiologists.

Statistical Analysis
Continuous data were expressed in terms of median value and range. Mann-Whitney test and Kruskal-Wallis test were used to verify differences between groups. Spearman correlation coefficient and intraclass correlation coefficient were used to analyze the correlation and variability among quantitative measurements generated by different computer tools and between radiological severity score obtained by the radiologists and quantitative results generated by the computer software.
p-value < 0.05 was considered significant for all tests. All analyses were performed using IBM SPSS Statistics 24 (IBM, Armonk, NY, USA).
The ICC showed great variability among the quantitative measurements of the residual healthy lung parenchyma volume, GGO, and consolidations volumes obtained by different computer tools ( Table 2). The lowest variability was reported for GGO volume. The Spearman correlation analyses (Table 3) showed a moderate correlation between lesion percentage determined by Thoracic VCAR and Myrian software (ranged from 0.54 to 0.78, all p < 0.05) while a low or mild correlation between lesion percentage determined by Thoracic VCAR and InferRead software was obtained (ranged from 0.34 to 0.50, all p < 0.05) and a low or mild correlation between lesion percentage determined by Myrian and InferRead software (ranged from 0.31 to 0.61, all p < 0.05).
The lung volumes quantified using the ThoracicVCAR tool were significantly lower in those with severe disease than in those without severe disease (p < 0.05) for the residual healthy lung parenchyma and GGO volumes (Table 4). Instead using Myrian software only residual healthy lung parenchyma and consolidation volumes showed differences statistically significant among patients with different severity scores (Table 5) while using the InferRead tool only residual healthy lung parenchyma and GGO volumes showed differences statistically significant (Table 6).
Overall radiological severity score was moderately correlated with the residual healthy lung parenchyma volume obtained by ThoracicVCAR or Myrian software (Spearman coefficient = 0.70-0.74), with GGO area obtained by the ThoracicVCAR tool (Spearman coefficient = 0.65) and with consolidation volume obtained by Myrian software (Spearman coefficient = 0.65) ( Tables 4 and 5). Instead, low correlations were reported among the overall radiological severity score and each quantitative measurement obtained by InferRead software (Table 6). Note. LHP = lung healthy parenchyma, GGO = ground-glass opacity. ** The correlation is significant at the 0.01 level (two-tailed). * The correlation is significant at 0.05 level (two-tailed).

Discussions
Several publications have described X-rays role and CT imaging features in patients affected by COVID-19, the evolution of these features over time, and the radiologist's performance to differentiate COVID-19 from other viral infections [10][11][12][13]. These studies have shown that typical CT findings of COVID-19 infection occur with two different patterns: peripheral, bilateral GGO with or without consolidation or intralobular lines ("crazy paving"); multifocal GGO of rounded morphology with or without consolidation or "crazy paving" [20]. The less typical pattern is characterized by non-peripheral non-rounded GGO with multifocal, diffuse, perihilar, or unilateral distribution, with or without consolidations [21,22].
Colombi et al. [32] reported that in patients with confirmed COVID-19 pneumonia, visual or software quantification the extent of CT lung abnormality were predictors of Intensive Care Unit (ICU) admission or death. They reported that the proportion of well-aerated lung assessed by chest CT was associated with better prognosis independent of other clinical parameters. Gozes et al. [19] used 2D and 3D deep learning models to explore AI-based automated CT image analysis tools for detection, quantification, and tracking of Coronavirus. One hundred and six COVID-19 chest CT scans and 99 normal ones were used to find potential COVID-19 thoracic CT features and to evaluate the progression of the disease in patients over time, generating a quantitative score.
At the best of our knowledge, this manuscript is the first with the aim to compare different computer tools for quantification in COVID-19 patients of pneumonia lesions on chest CT.
We demonstrated that there was great variability among the quantitative measurements obtained by different commercial computerized tools. Moreover, we reported differences statistically significant among volumes of residual healthy lung parenchyma, GGO, and consolidation considering the overall radiological score of patients with severe disease respect to those without severe. In addition, we reported that the overall radiological severity score was moderately correlated with the residual healthy lung parenchyma volume obtained by ThoracicVCAR or Myrian software, with the GGO area obtained by the ThoracicVCAR tool and with consolidation volume obtained by Myrian software. Instead, InferRead software had a low correlation with the overall radiological severity score. Therefore, considering our results, the ThoracicVCAR and Myrian tools seem to be the most effective and easiest software programs to provide automatic quantitative measurement in COVID-19 patients because it provides a semi-automatic and fast segmentation of lesions; a visualization of pathological lung areas (ground-glass opacities, crazy paving, consolidations, emphysematous areas). Therefore, these tools can be used in clinical practice to assist radiologists diagnoses.
An ideal software for COVID-19 should have automatic recognition of internal lung fields; the possibility to exclude airways and pulmonary vessels; automatic recognition of increased caliber peripheral pulmonary vessels; automatic recognition of increased caliber (over 2.9 cm) pulmonary artery; the possibility of calculating the percentage of emphysematous parenchyma, GGO, consolidation, and well ventilated residual lung parenchyma; the distinct percentage for lobes, lungs and total; the possibility of reporting these percentages values in the reference without copying them; the possibility to memorize lesions volume automatic quantification for possible comparison with a subsequent examination of the same patient.
The limitations of this study included the retrospective nature of the study and the sample size having determined the great variability of three computer tools in lung volumes quantification.
In the future, examining the correlation between quantitative CT parameters and clinical symptoms and laboratory indices would be useful for guiding clinical decision-making.
In summary, computer-aided quantification could be an easy and feasible way to stratify patients according to disease severity by COVID-19; however, a great variability among quantitative measurements provided by different commercial computerized tools should be considered.