1. Introduction
A new coronavirus (severe acute respiratory syndrome coronavirus 2, SARS-CoV-2) is the pathogen responsible for the SARS-CoV-2 disease (COVID-19), which has spread throughout the world since December 2019 [
1,
2,
3,
4,
5,
6,
7,
8,
9]. COVID-19 was defined as a pandemic by the World Health Organization on 11 March 2020 [
10]. The clinical expressions of COVID-19 range from flu-like symptoms to respiratory failure, the management of which demands advanced respiratory assistance and artificial ventilation [
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21]. The clinical spectrum of COVID-19 pneumonia ranges from mild to critical cases, among which the diagnosis of ordinary, severe, and critical cases was related to chest computed tomography (CT) findings [
22,
23]. CT imaging allows for the early detection of lung abnormalities in patients with SARS-CoV-2 pneumonia [
24,
25], representing a useful diagnostic tool, with pooled sensitivity and a specificity of 94% and 37%, respectively [
26]. Additionally, approximately one-third of COVID-19 survivors showed pulmonary fibrotic-like changes at a six-month follow-up chest CT [
27]; there is speculation that some of these findings will resolve over time, and are therefore not fibrosis [
27]. Although a visual method allows the assessment of these findings, a quantitative evaluation based on software systems, not dependent on the experience of the reader, allows for a greater accuracy of analysis and facilitates the evaluation of the data over time, reducing the error of the qualitative evaluation alone [
8]. While several artificial intelligence (AI) models have been developed to facilitate the automation of COVID-19 diagnosis [
11,
13,
17], there has been little study of COVID-19 lesion segmentation. To detect regions of interest (ROIs) from CT scans is an interesting and challenging task for several reasons: (a) a large divergence in the characteristics of lesions in terms of scope, location, shape, and quality makes them difficult to classify; (b) small, inter-class divergence means that the margins of ground-glass opacity (GGO) predominantly exhibit clouded manifestation and low contrast, which complicates the detection process; (c) noisy annotation is inevitable for rare or new diseases (e.g., COVID-19), which decreases segmentation efficiency. However, the quantitative assessment of infection and longitudinal changes in CT findings could offer useful and vital information in fighting against COVID-19.
The aim of this retrospective study is to investigate the efficacy of two commercial software in the assessment of chest CT sequelae in patients affected by COVID-19 pneumonia, comparing the consistency of these two tools.
2. Materials and Methods
2.1. Patient Selection
This retrospective study included patients enrolled by the National Institute of Infectious Diseases Lazzaro Spallanzani Hospital, Rome, Italy.
Considering the emergency period, the local institutional review board waived informed consent for included patients in this retrospective study.
In order to homogenize the sample under examination, only patients who were subjected to CT at discharge and at a 3-month follow-up (range 30–237 days) were included.
The study group included 120 patients (56 women and 104 men; median age: 61 years; range: 21–93 years) who were confirmed to be infected with COVID-19 using the nucleic acid amplification test in the respiratory tract with a reverse transcription real-time fluorescence polymerase chain reaction test (RT-PCR) between 5 March 2020 and 15 March 2021.
As a control group, we selected 40 patients (median age: 60 years; range: 38–90) without lung disease who underwent chest CT at the same institute that was staging an examination for colorectal cancer.
2.2. CT Technique
We performed 128 slices of chest CT scans with Incisive Philips CT scanners (Amsterdam, The Netherlands). CT examinations were performed with the patient in the supine position in breath-hold, and inspiration using a standard dose protocol, without contrast intravenous injection. The scanning range was from the apex to the base of the lungs. The tube voltage and the current tube were 120 kV and 100–200 mA (and if applicable, using
z-axis tube current modulation), respectively. All data were reconstructed with a 0.6–1.0 mm increment. The matrix was 512 mm × 512 mm. Images were reconstructed using a sharp reconstruction kernel for parenchyma evaluation and hard reconstruction kernel for other lung evaluation. All data were reconstructed with a 0.6–1.0 mm increment. Multiplanar reconstruction was also calculated. Details are provided in previous papers [
8,
11].
2.3. Qualitative Assessment
Four expert radiologists in the infectious disease field (with experience of at least 5 years) were working independently on the same CT series of studies, and in addition, discrepant findings were recorded and evaluated in consensus. A qualitative evaluation included the presence of the following CT findings: (a) GGOs, (b) consolidation, (c) interlobular septal thickening, (d) fibrotic-like changes (reticular pattern and/or honeycombing), (e) bronchiectasis, (d) air bronchogram, (e) bronchial wall thickening, (f) pulmonary nodules surrounded by GGOs, (g) pleural and (h) pericardial effusion, (i) lymphadenopathy (defined as lymph node with short axis > 10 mm), and (j) emphysema.
All chest CT findings were defined according to the Fleischner Society glossary [
28].
For each of them, they reported (1) location, (2) multilobe involvement, (3) total lobar involvement, and (4) bilateral distribution.
2.4. CT Post-Processing
Primary image data sets (0.6–1.0 mm) were transferred to the PACS workstation and the same CT images were evaluated using two clinically available computer tools by the same 4 readers in consensus (no discrepant data can be obtained with automatic computerized quantification). The tools used were thoracic VCAR software (GE Healthcare, Chicago, IL, USA) and a pneumonia module of ANKE ASG-340 CT workstation (HTS Med & Anke, Naples, Italy).
Table 1 reports a comparison among evaluated commercial software based on the provided functionalities.
2.4.1. Post-Processing with Thoracic VCAR Software
Thoracic VCAR software is a CE-marked medical device designed to quantify pulmonary emphysema in patients with chronic obstructive pulmonary disease. The tool provides segmentation of the lungs and of the airway tree. Moreover, the tools provided the quantification of the emphysema, healthy residual lung parenchyma, GGO, and consolidation based on a Hounsfield unit. Details are provided in previous papers [
8,
11]. The total volumes for both the right and left lung were also calculated (
Figure 1).
2.4.2. Post-Processing with ANKE ASG-340 CT Workstation
The ANKE ASG-340 CT workstation from HTS Med & ANKE is a comprehensive CT workstation that uses lung nodules analysis, pneumonia analysis, dental pack, vascular analysis cerebral hemorrhage analysis, and so on. The pneumonia module is designed to quantify pneumonia patients. The software provides automatic segmentation of the lungs and lung lobs and automatic location and measurement pneumonia including volume, CT value, and component analysis. It provides the classification of voxels based on Hounsfield Units (
Figure 2), as was previously described for the thoracic VCAR Tool.
2.5. Statistical Analysis
The median value and range were calculated. A chi-square test, Mann–Whitney test, and Kruskal–Wallis test were used to verify the differences between groups. The Pearson correlation coefficient and intraclass correlation coefficient were used to analyze the correlation and variability among the quantitative measurements generated by different tools [
3].
A receiver operating characteristic (ROC) analysis was performed. The area under curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were obtained. A p value of <0.05 was considered significant for all tests.
The statistical analyses were performed using the Statistics Toolbox of MATLAB R2007a (MathWorks, Natick, MA, USA).
3. Results
In the study group, 240 chest CT examinations (at discharge/baseline and follow-up time; range: 30–237 days) were analyzed.
3.1. Qualitative Assessment
At baseline, the patients had: GGOs (120; 100%); consolidation (108; 90.0%); interlobular septal thickening (120; 100%); fibrotic-like changes (reticular pattern and/or honeycombing) (116; 96.7%); bronchiectasis (80; 66.7%); air bronchogram (10; 8.3%); bronchial wall thickening (120; 100%); pulmonary nodules surrounded by GGOs (40; 33.3%); pleural (45; 37.5%) and pericardial effusion (6; 5%); lymphadenopathy (0; 0%), and emphysema (107; 89.2%).
All patients had a multilobe and bilateral distribution.
At follow-up, the patients had: GGOs (120; 100%); consolidation (120; 100%); interlobular septal thickening (120; 100%); fibrotic-like changes (reticular pattern and/or honeycombing) (120; 100%); bronchiectasis (120; 100%); air bronchogram (40; 33.3%); bronchial wall thickening (120; 100%); pulmonary nodules surrounded by GGOs (0; 0%); pleural (4; 3.3%) and pericardial effusion (0; 0%), and emphysema (107; 89.2%).
A statistically significant difference was found considering the presence in the percentage value of pulmonary nodules surrounded by GGOs pleural effusion between the two groups (p < 0.01 at Chi square test).
All patients had a bilateral distribution with multilobe involvement.
In the control group, we evaluated 40 chest CT examinations in 12 patients (30%), and the only features identified was emphysema.
3.2. Quantitative Assessment
The thoracic VCAR software was not able to perform volume segmentation in 12/280 (4.3%) cases, while the pneumonia module of the ANKE ASG-340 CT workstation performed in 19/280 (6.8%) patients.
The ICC showed great variability among the quantitative measurements of the emphysema, residual healthy lung parenchyma volume, GGO, and consolidations volumes obtained by different tools when calculated on baseline CT scans (
Table 2), and considering all patients.
A good ICC (≥0.6) was obtained for the quantitative GGO and consolidations volumes obtained by two tools when calculated on baseline CT scans (
Table 2), and considering the control group (
Table 2).
An excellent ICC (≥0.75) was obtained for the quantitative residual healthy lung parenchyma, GGO, and consolidations volumes obtained by two tools when calculated on follow-up CT scans (
Table 3).
In addition, an excellent ICC (≥0.75) was obtained for the residual healthy lung parenchyma volume and GGO quantifications when the percentage change of these volumes was calculated between the baseline and follow-up examination.
The lowest variability in the quantification was obtained for the GGO volume quantification (ICC = 0.94). The Pearson correlation analyses (
Table 4) showed a low correlation for each of the quantitative volume measurements determined by the thoracic VCAR tool and ANKE ASG-340 CT workstation pneumonia tool; exclusively, the GGO measurement showed a moderate correlation (Pearson correlation coefficient = 0.682,
p < 0.01).
The lung volumes quantified using the thoracic VCAR tool on baseline CT scans were significantly different between RT-PCR positive and the control group (
p < 0.05) for all volumes, except that for the quantification of the emphysema volume (
Table 5,
Figure 3).
Instead, using ANKE ASG-340 CT pneumonia software baseline CT scans, GGO and consolidation volumes exclusively showed statistically significant differences among patients with RT-PCR positive and the control group (
p < 0.05) (
Table 6,
Figure 4).
Table 7 shows the volumes percentage change between baseline and follow-up time in patients with positive RT-PCR in terms of median, minimum, and maximum values.
The lung volumes quantified by two tools in terms of median, minimum, and maximum values obtained on follow-up CT scans are reported in the
Table 8.
Table 9 showed ROC analysis results for volumes obtained on baseline CT scans for both tools. The highest value of accuracy to identify patients with RT-PCR positive was obtained by GGO total volume quantification by the thoracic VCAR (accuracy = 0.75).
Considering the results obtained by the ANKE ASG-340 CT pneumonia tool, the consolidation volume of the left lung obtained the highest accuracy, equal to 0.
4. Discussions and Conclusions
In this study, we evaluated the quantitative analysis efficacy of chest CT sequelae in patients affected by COVID-19 pneumonia, comparing the consistency of two computerized tools. The visual evaluation of longitudinal changes in CT scans by radiologists is often a tedious task. There is a need to have a simple and fast automated method that can provide the segmentation and quantification of infection regions in order to evaluate the progression of the infected patients using lung CT scans [
29,
30,
31,
32,
33,
34,
35]. Additionally, an objective evaluation by means of AI systems allows a data quantification, and therefore, an accurate definition of the disease progression; this is an element that otherwise is not very robust if entrusted to a simple visual inspection [
36,
37,
38]. Recently, several computer tools have been proposed for the recognition of lung lesions from COVID-19 on CT examination [
39,
40,
41]. However, many of them are not approved as medical devices, nor do they have the CE marking. Furthermore, the variability reported in the results obtained by these tools makes it difficult to choose the most accurate system [
8].
To the best of our knowledge, this manuscript is the first with the aim to compare different computer tools for chest CT sequelae in patients affected by COVID-19 pneumonia. We demonstrated that there was a great variability among the quantitative measurements of the emphysema, residual healthy lung parenchyma volume, GGO, and consolidations volumes obtained by different computer tools when calculated on baseline CT scans. Instead, a good ICC was obtained for the quantitative measurements of the GGO and consolidations volumes obtained by two different computer tools when calculated on baseline CT scans, while considering the control group. Moreover, an excellent ICC was obtained for the quantitative measurements of the residual healthy lung parenchyma volume, GGO, and consolidations volumes obtained by two different computer tools when calculated on follow-up CT scans, and for the residual healthy lung parenchyma volume and GGO quantifications when the percentage change of these volumes was calculated between the baseline and follow-up scan. The lowest variability in the quantification was obtained for the GGO volume.
The Pearson correlation analyses showed a low correlation between the quantitative volume measurements determined by the thoracic VCAR tool and ANKE ASG-340 CT workstation pneumonia tool; exclusively, the measurement of the GGO showed a moderate correlation.
We think that the greater variability found at the baseline is linked to the complexity of the cases analyzed in this phase, which could affect the accuracy of lesion segmentation. As demonstrated by Herrmann et al. [
42], in ARDS, image segmentation is especially difficult, since in some cases, it is almost impossible to discriminate the edge of the lung parenchyma from a pleural effusion, particularly in the most dependent lung regions and most severe ARDS forms. Also, at different airway pressures, they observed differences in lung weights. These variations may result either from the segmentation procedure and/or from actual changes in lung weight, primarily due to a possible airway pressure-dependent blood shift. It is unfortunately impossible to determine how much of the weight variation is due to an intrathoracic blood shift or to inaccuracies of the segmentation process. The decrease in intrathoracic blood volume we estimated in a previous work with increasing airway pressures was about 100 mL, leading to a small decrease in lung weight [
43].
So, we believe that at follow-up, with a smaller extension of pulmonary involvement, the variability between the two systems is partially reduced, since the segmentation process is simpler in the absence of variables related to the presence of pleural effusion, and increase in pressures in the pulmonary vessels; the resolution of these variables favor the definition of the different pixels [
44].
There were main critical points of the thoracic VCAR tool: automatic segmentation does not include areas of abundant consolidations of the lung parenchyma or pleural effusions, if conspicuous, requiring the manual segmentation modality; there was difficulty in the manual lung segmentation mode; its correction, performed on a single slice, takes time.
There were main critical points of the ANKE ASG-340 CT workstation pneumonia tool: it is slow in the analysis (120 s of median value compared to 10 s); it overestimates emphysema quantification; it is not able to segment complex cases with conspicuous effusion and/or areas of extensive consolidations.
Both tools, moreover, do not recognize several CT findings typical of the evolution of the disease, such as: (a) interlobular septal thickening, (b) fibrotic-like changes (reticular pattern and/or honeycombing), (c) bronchiectasis, (d) air bronchogram, (e) bronchial wall thickening, (f) pulmonary nodules surrounded by GGOs, (g) pleural and (h) pericardial effusion, and (i) lymphadenopathy, including these feature in others and reducing the accuracy of the assessment of the fibrotic-like changes.
According to Johns Hopkins University, case-fatality rates of COVID-19 patients ranges between 1% and 7% based on days since first confirmed case, testing efficacy, local pandemic response policies, and the population age [
45,
46,
47,
48,
49]. Multi-organ manifestations of COVID-19 are now well-documented [
50,
51,
52,
53,
54,
55,
56,
57], but the potential long-term implications of these manifestations remain to be uncovered. Several studies have reported impaired exercise capacity and diffusing capacity for carbon monoxide (DL
CO) in SARS-CoV-1 survivors extending from 6 months to 15 years of follow-up [
58,
59,
60,
61,
62,
63,
64], suggesting impairment of the intra-alveolar diffusion pathway. In this scenario, it is clear that it is important to have tools that objectively allow a stratification of patients based on the risk of developing chronic diseases that can impact their quality of life, and economically impact health care [
65,
66]. We believe that the computed assessment of CT findings could identify pulmonary abnormalities and lung recruitment, and we believe that knowledge of the percentage of potentially recruitable lung evolution may be important to establish the therapeutic management in chest sequelae in patients affected by COVID-19 pneumonia.
The present study has advantages: first, the homogeneity of the sample under examination and the follow-up at three months; second, the CT was performed at the same center, reducing the variability linked to different equipment; third, the high level of expertise of the group of radiologists who analyzed the images.
The major technical limitations for both tools is the lack of correlation of radiological data with clinical/functional data. It would be useful to evaluate how CT findings relate to functional investigations such as spirometry and/or lung scintigraphy. However, these data are present only for a small part of the population under examination.
In summary, computer-aided quantification could be an easy and feasible way to assess chest CT sequelae due to COVID-19 pneumonia; however, a great variability among the measurements provided by different tools should be considered.