Chronic obstructive pulmonary disease (COPD) is a common and largely avoidable disease that is characterized by irreversible airway obstruction, attributable to inhaled noxae and particles. Therapeutic decisions in patients with COPD have largely relied on spirometry. However, given the 2017 GOLD report, spirometry is no longer valid for therapeutic decision-making. Currently, medication is chosen by clinical criteria without widespread availability of objective, quantifiable diagnostic tools [1
Quantified computed tomography (qCT) is an emerging technique in the complex field of COPD diagnostics. First described in 1988 by Muller et al., they used standardized, visual “density masks” to quantify emphysema in patients with COPD [2
]. With advances in scanner technologies and evaluation algorithms, increasing amounts of information can be gathered from non-contrast-enhanced chest CTs in COPD. For example, bronchial airway parameters were shown to correlate not only with lung function parameters but also with exacerbation rates and clinical symptoms [3
]. Additionally, significant pathological findings in non-contrast-enhanced chest CT in patients with COPD, such as emphysema and air trapping, are associated with an increased mortality [6
]. On the basis of this evidence, the American Thoracic Society and the European Respiratory Society proposed to evaluate routine chest CT in patients with newly diagnosed COPD [9
In light of these findings, as well as emergent research in this field, qCT is more likely to contain additionaly, comprehensive COPD-related information than is currently understood and utilized.
Machine-based learning algorithms comprise emergent techniques to evaluate potential connections between imaging data and clinical parameters or outcomes [10
]. Recently, several commercially available software solutions, using algorithms similar to those in stroke imaging and diagnosis, have been successfully evaluated [12
]. However, the potential benefits of these methods have currently not been applied to the field of qCT in patients with COPD.
Therefore, the aim of this study was to evaluate the feasibility of predicting lung function parameters with radiomics features, acquired by non-contrast-enhanced chest qCT, in patients with COPD. Various mathematical models as well as machine learning-based algorithms were evaluated.
The HIPAA-compliant study protocol, which is in accordance to the Declaration of Helsinki, was approved by our local ethics committee (2015-415M-MA).
We prospectively enrolled 75 consecutive patients with previously diagnosed COPD and a clinical indication for non-contrast-enhanced chest CT in a single center approach. Written informed consent was obtained from all patients following a full explanation of the purpose of the study and of potential risks and discomfort associated with their participation.
2.2. Lung Function Testing
All patients underwent whole-body plethysmography (MasterScreen® Body, CareFusion, Höchberg, Germany) which yielded the following parameters: forced expiratory volume in one second (FEV1
), ratio of residual volume to total lung capacity (RV%TLC), and Tiffeneau index (FEV1
/VC). Apart from RV%TLC, all plethysmographic values are represented as the percentage of the expected value, as calculated according to current ATS/ERS recommendations [13
2.3. Computerized Tomography Examinations
A non-contrast chest scan was performed in maximum inspiration and maximum end-expiration using a third-generation dual-source CT system (Somatom FORCE, Siemens Healthineers, Forchheim, Germany) at 100 kVp with an additional tin filter behind the source for dose reduction [15
]. In addition to a pre-imaging briefing, the patients received voice commands for optimal inspiration and expiration results. Existing medication was not withheld prior to imaging. The scan parameters were as follows: 100 kVp tube voltage, automated tube current modulation using 96 mAs at 120 kVp as reference (effective mAs = 166.5 ± 105), 0.25 s rotation time, pitch 1.2, 192 mm × 0.6 mm detector collimation. All images were reconstructed with a slice thickness of 1.5 mm, using a suitable reconstruction kernel for quantitative lung analysis (Br32) and a third-generation, iterative reconstruction technique (Adaptive Model-based Iterative reconstruction [ADMIRE], Siemens Healthineers, Germany). The reconstruction algorithm was explained in a recent publication by Gordic et al. [17
]. An iterative strength level of four (out of a maximum of five) was chosen for the present study for optimum image noise, as recommended by the CT vendor for quantitative lung analysis. The average CTDI was 0.48 ± 0.19 mGy and the mean DLP 17.2 ± 6.5 mGy·cm.
2.4. Image Analysis
Inspiratory and expiratory datasets were analyzed using dedicated semi-automatic software (SyngoViaVB10, Pulmo3D, Siemens Healthineers, Forchheim, Germany). Lung segmentation was automated and manually revised only if necessary. Four quantitative parameters were acquired: total lung volume (volume), mean lung density (MLD), full-width-half-max (FWHM), and low attenuation volume (LAV). The LAV threshold for emphysema was set to −950 HU. This cut-off has been extensively evaluated in previous studies and strongly correlates with microscopic and gross emphysema [18
]. FWHM marks the width at the half maximum of the voxel count to a specific HU value curve, representing the density distribution of the lung parenchyma. The difference of the values between inspiratory and expiratory scans were defined as delta values and were used as a distinct dataset in this evaluation. The use of this additional delta data has shown to be beneficial in previous studies [3
2.5. Model Training and Evaluation
Five models were used for the prediction of lung function values: mean prediction, median prediction, polynomial regression, k-nearest neighbor regression (kNN), an additive regression with regression trees in XGBoost, and an artificial neural network multilayer perceptron regression (Tensorflow, Google, Mountain View, CA, USA). All breathing states (inspiration, expiration, delta) and parameters (volume, MLD, LAV and FWHM) were used as input data for every model.
Mean and median prediction was performed, using target lung function values. Therefore, both methods yielded the same results for inspiration, expiration, and delta values. To find best performance, polynomial regression and kNN were performed using different degrees and k values, respectively. The polynomial regression was calculated for every degree from 1 to 10 because the performance peak had already been reached after the third degree. The kNN regression was performed for all k-values between 1 and 74 because 75 patients were included in the study. The approach of finding the ideal k and degree can be seen in Figure 1
. Weighted voting based on distance was used to combine the values of the k neighbors.
XGBoost and multilayer perceptron largely remained at their default settings. For XGBoost, this meant a total number of 100 trees, with a maximal depth per tree of three and a boosting learning rate of 0.1. The multilayer perceptron had five neurons building the input layer, two hidden layers with 256 neurons each, and one output neuron.
2.6. Model Evaluation and Statistical Analysis
“Leave one out” cross validation was used to measure the prediction performance for all models and parameters, which has been shown to be beneficial for small cohorts [21
]. The mean absolute and the corresponding mean relative error (MRE) were used as markers for prediction performance [23
]. Both values were compared for every model, breathing state, and parameter using ANOVA and corrected by Tuckey HSD for multiple testing via dedicated software (JMP 12, SAS, Cary, CA, USA). A p
-value of less than 0.05 was considered statistically significant.
This work evaluated the feasibility of predicting lung function values by parameters derived from qCT using different mathematical models. We demonstrated that this prediction can be performed within a reasonable margin of error with regard to differences in target values and prediction models. Overall, kNN and polynomial regression showed the lowest MREs in our study. As shown in previous studies, kNN and polynomial regression are suitable methods for prediction models in small sample sizes [24
Our analysis found only minor significant differences in prediction performance among the evaluated models. Significant differences were shown for the median prediction in the FEV1. One reason for this might be that the median prediction was only based on the lung function values and was used as a baseline for the other prediction models. This explains the low standard deviation of both as well as mean and median prediction.
The XGBoost-based prediction showed the FEV1 in the delta values and significantly lower prediction values when compared with the other models. That only significant differences were found in the FEV1 seems to be consistent when looking at the global results: all prediction models performed better in the prediction of FEV1/VC and RV/TLC than only FEV1.
One explanation for this and the aforementioned significant differences might be that the FEV1
is an explicit dynamical parameter and the qCT, even when performed in two breathing states, contains only static information. Therefore, the TLC, VC, and the RV may be easier to predict with regard to the initial static information. This theory is supported by the different datasets (inspiration, expiration, delta), which showed different MREs with regard to the parameters. This might be attributable to the physiological meaning of the different datasets. In terms of the expiratory dataset, if the patient is in maximal expiration, the volume measured in the expiratory CT is equivalent to the RV. The is similar to the inspiratory CT, which is, at maximal inspiration, equivalent to the TLC. Similarly, if we examine the delta dataset, the subtraction of inspiratory (TLC) and expiratory (RV) data, we receive the maximal mobilizable volume, i.e., the VC. Accordingly, the various MREs seem to be consistent because the lowest MRE for the FEV1
/VC was found in the delta dataset. This was similar to the RV/TLC, which was the lowest in the expiration dataset. Even if the differences were not statistically significant, the calculated results were in line with our physiological understanding of the different qCT datasets. Similar findings have been reported in previous studies [3
]. A larger cohort might confirm our hypothesis to a significant level. In summary, this highlights the importance of an additional scan in maximal expiration, as stated by the Fleischner Society and other studies [28
Most previous studies only correlated lung function values with qCT parameters but did not attempt to predict the lung function itself [3
]. Nevertheless, Gu et al. predicted lung function values based on the emphysema acquired by qCT. They calculated MREs in the prediction of the FEV1
% between 17% and 256% [32
]. Similar to our study, the prediction of FEV1
/VC performed better when compared with the FEV1
% prediction. In comparison with our approach, they only used one linear prediction model and only different subdivisions of the LAV as initial prediction factors.
In our analysis, the neural-network-based prediction did not perform significantly different when compared with the other mathematical models. With regard to the rise of neural-network-based models, this seems counterintuitive [10
]. A significant reason for this may be our sample size. As shown in previous work, the size of the initial training set is crucial for the performance of such a neural-network-based prediction model [33
]. A larger initial dataset will likely significantly improve the performance.
Our study contained several limitations, one of which was our relatively small sample size. A larger cohort will not only improve performance in the neural-network-based model but will also improve generalizability in all models. Further, we evaluated only a limited variety of algorithms with specific settings. A more comprehensive tuning of the profound parameters in XGBoost and the neural network-based model would most likely have improved the prediction performance. Further studies should evaluate the influence of different factors, for example, tree depths, network layers, or boosting rates.
A third problem was in the initial training dataset itself. We chose a real-life dataset of patients with diagnosed COPD. With regard to the heterogeneity of COPD and its airway obstruction states, a wide standard deviation is present. Therefore, with regard to the small sample size, the models had to adapt to a large range of values. Again, especially for such a heterogenic cohort, a larger sample would be beneficial.
In summary we were able to show that different prediction models can be used to predict lung function values from qCT in patients with COPD. Therefore, the best prediction performance was most likely attributable to the sample size, which was calculated for kNN and polynomial regression. Nevertheless, the prediction performance was generally not significantly different from the other models, suggesting they can also be used for similar objectives. Therefore, computationally incomprehensive models (e.g., KNN or polynomial regression) might offer valid alternatives in lung function prediction when compared with neural networks. Further evaluations with larger sample sizes might benefit the XGBoost and neural-network-based prediction. Nevertheless, the prediction performance of the models used in this work significantly overcomes previous, one-linear regression systems using explorations [32
Overall, the prediction of static parameters was superior when compared with the ones of dynamic parameters. Additionally, different breathing states showed different prediction performances with regard to individual parameters. In contrast to inspiration-only-based research, this underlines the need for an additional expiratory scan; further studies should take this as well as the calculated delta values into account [35
Although traditional, inexpensive lung function testing (e.g. spirometry) is not likely to be replaced by quantified computed tomography, there might be a future application for similar prediction models. Sites without a pneumology department or additional information from a noncontrast-enhanced chest CT in outpatient care would benefit from such a prediction system because lung function changes might be recognized earlier. Further, specific pneumological testing could be carried out after these incidental findings. As seen in this proof-of-concept work, quantified computed tomography can be used to predict lung function values in COPD. The distinct differences shown between dynamic and static parameters as well as the importance of separate inspiratory, expiratory, and delta datasets should be useful for research groups for future evaluation.