Predicting Non-Small-Cell Lung Cancer Survival after Curative Surgery via Deep Learning of Diffusion MRI

Background: the objective of this study is to evaluate the predictive power of the survival model using deep learning of diffusion-weighted images (DWI) in patients with non-small-cell lung cancer (NSCLC). Methods: DWI at b-values of 0, 100, and 700 sec/mm2 (DWI0, DWI100, DWI700) were preoperatively obtained for 100 NSCLC patients who underwent curative surgery (57 men, 43 women; mean age, 62 years). The ADC0-100 (perfusion-sensitive ADC), ADC100-700 (perfusion-insensitive ADC), ADC0-100-700, and demographic features were collected as input data and 5-year survival was collected as output data. Our survival model adopted transfer learning from a pre-trained VGG-16 network, whereby the softmax layer was replaced with the binary classification layer for the prediction of 5-year survival. Three channels of input data were selected in combination out of DWIs and ADC images and their accuracies and AUCs were compared for the best performance during 10-fold cross validation. Results: 66 patients survived, and 34 patients died. The predictive performance was the best in the following combination: DWI0-ADC0-100-ADC0-100-700 (accuracy: 92%; AUC: 0.904). This was followed by DWI0-DWI700-ADC0-100-700, DWI0-DWI100-DWI700, and DWI0-DWI0-DWI0 (accuracy: 91%, 81%, 76%; AUC: 0.889, 0.763, 0.711, respectively). Survival prediction models trained with ADC performed significantly better than the one trained with DWI only (p-values < 0.05). The survival prediction was improved when demographic features were added to the model with only DWIs, but the benefit of clinical information was not prominent when added to the best performing model using both DWI and ADC. Conclusions: Deep learning may play a role in the survival prediction of lung cancer. The performance of learning can be enhanced by inputting precedented, proven functional parameters of the ADC instead of the original data of DWIs only.


Introduction
Lung cancer is the most common cause of cancer death, accounting for 26.6% of all cancer deaths [1]. The survival of lung cancer patients is expected differently according to the stage of lung cancer when it is diagnosed. Non-small-cell lung cancer (NSCLC) with localized disease without regional or distant metastasis shows 59.0% 5-year relative survival, whereas NSCLC with distant metastasis shows only a 5.8% 5-year relative survival rate [2]. The possibility of survival of lung cancer at the time of diagnosis can only be

Patients
The institutional review board of our institution approved this study as a part of a clinical trial for the staging of lung cancer, which was registered as a randomized clinical trial with ClinicalTrials.gov number NCT01065415. Written informed consent was obtained from all patients in the single tertiary referral hospital. From January 2010 through to November 2011, patients with stage I, II, or IIIA NSCLC (other than N2 disease) based on clinical staging underwent conventional work up including physical examination, laboratory tests, bronchoscopy, chest CT, or PET/CT upon admission (n = 151). In cases of an inappropriate condition for surgery, such as poor pulmonary function, poor performance status (ECOG 3 or 4), concurrent medical diseases, history of malignancy treatment, contraindication for MR image acquisition, or refusal of involvement, patients were excluded (n = 51). After MR image acquisition, thoracotomy with or without mediastinoscopy was performed, and 100 patients were included (57 men, 43 women; mean age, 62 years).
We evaluated age, sex, smoking history, tumor size, pathologic type, surgical stage of NSCLC (AJCC 7th), and survival information based on an electrical chart review. The causes of death statistics were updated annually by the National Statistical Office, and the electrical charts of cancer patients had their updated survival information. From the date of MR acquisition, 5-year survival was determined by the date of death or last follow-up date of survivors on the chart.

Image Processing
DWI is used for the calculation of the ADC. To generate the perfusion-insensitive ADC by eliminating the pseudo-diffusion effect, the ADC was calculated based on a b-value of 100 and 700 (ADC 100-700 ). The perfusion-sensitive ADC value was calculated using a b-value of 0 and 100 (ADC 0-100 ). The overall conventional ADC value was calculated using a b-value of 0, 100, and 700 (ADC 0-100-700 ). Specifically, the ADC value was calculated using a mono-exponential model [33]: where S (b) is the signal intensity at a particular b-value, S 0 is the signal intensity with b = 0 s/mm 2 , and b is the b-factor. The ADC value was estimated via linear fitting using Matlab (Mathworks, Natick, MA, USA). For each voxel, three ADC (ADC 0-100-700 , ADC 0-100 , and ADC 100-700 ) values were estimated with a low b-value (slope between 0 and 100 s/mm 2 , ADC 0-100 ; microperfusion-facilitated ADC), a high b-value (slope between 100 and 700 s/mm 2 , ADC 100-700 ; perfusion-insensitive ADC), and overall b-values (slope between 0, 100, and 700 s/mm 2 , ADC 100-700 ; conventional ADC). The tumor ROI was manually defined on the axial ADC 0-100-700 map. The voxels ranging from 2.5% to 7.5% of the ADC 0-100-700 values within the tumor ROI were extracted and averaged to compute the ADC 0-100-700 value. The corresponding voxels were used to compute the ADC 0-100 and ADC 100-700 values.
The DWI and ADC images were normalized as input data for the value of signal intensity and the size of the pixel. The signal intensity of images was normalized into a range from 0 to 1 and all images were interpolated as 2 mm sized pixel images. The cancer was manually segmented on the ADC map by a radiologist (CAY with 20 years of experience). From the manually segmented lung cancer volume, the slice with the largest area of the lung cancer was selected as a mask. Then, the lung cancer region of the DWI and ADC images (DWI 0 , DWI 100 , DWI 700 , ADC 0-100 , ADC 100-700 , and ADC 0-100-700 ) were segmented using the selected mask. In our dataset, the maximum size of the lung cancer was found to be 34 × 30 pixels. The segmented images were padded into the size of 56 × 56 pixels, and then resized to 224 × 224 pixels to be fed as an input to our deep learning model.

Deep Learning Model for Survival Prediction
In this paper, we propose a survival prediction model for lung cancer using deep learning with the transfer learning of VGG-16 as the backbone structure. VGG-16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman [34]. VGG-16 consists of sixteen layers: 13 convolutional layers, 2 fully connected layers, and 1 softmax layer for the output. The input of the network is three channels of images in 224 × 224 resolution. When three channels of images entered the model as input data, the feature maps of the network were generated through a convolution operation process with a combination of three channels of images. The output was the classification of 1000 objects through the softmax layer in the ImageNet dataset.
In this study, we modified the softmax layer of the VGG-16 model into a binary classification of survival and death. The architecture of the modified VGG-16 is described in Figure 1. When clinical information was added to our model, a fully connected layer was added in the latter part of the deep learning structure to evaluate the augmented performance for the prediction of survival in NSCLC patients. manually segmented on the ADC map by a radiologist (CAY with 20 years of experience). From the manually segmented lung cancer volume, the slice with the largest area of the lung cancer was selected as a mask. Then, the lung cancer region of the DWI and ADC images (DWI0, DWI100, DWI700, ADC0-100, ADC100-700, and ADC0-100-700) were segmented using the selected mask. In our dataset, the maximum size of the lung cancer was found to be 34 × 30 pixels. The segmented images were padded into the size of 56 × 56 pixels, and then resized to 224 × 224 pixels to be fed as an input to our deep learning model.

Deep Learning Model for Survival Prediction
In this paper, we propose a survival prediction model for lung cancer using deep learning with the transfer learning of VGG-16 as the backbone structure. VGG-16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman [34]. VGG-16 consists of sixteen layers: 13 convolutional layers, 2 fully connected layers, and 1 softmax layer for the output. The input of the network is three channels of images in 224 × 224 resolution. When three channels of images entered the model as input data, the feature maps of the network were generated through a convolution operation process with a combination of three channels of images. The output was the classification of 1000 objects through the softmax layer in the ImageNet dataset.
In this study, we modified the softmax layer of the VGG-16 model into a binary classification of survival and death. The architecture of the modified VGG-16 is described in Figure 1. When clinical information was added to our model, a fully connected layer was added in the latter part of the deep learning structure to evaluate the augmented performance for the prediction of survival in NSCLC patients. We evaluated the predictive power of deep learning in three different combinations for input data: 1. DWI only, 2. DWI and ADC, and 3. DWI, ADC, and clinical information. As input data for the network, three channels of image data were selected from DWI and ADC. Combinations included DWI as anatomical data, ADC0-100 as perfusion-sensitive ADC, ADC100-700 as perfusion-insensitive ADC, and ADC0-100-700 as conventional ADC. The survival network could capture features related to the survival of the lung cancer patient from the input dataset through training. The output of the network was the survival probability of the lung cancer patients. We evaluated the predictive power of deep learning in three different combinations for input data: 1. DWI only, 2. DWI and ADC, and 3. DWI, ADC, and clinical information. As input data for the network, three channels of image data were selected from DWI and ADC. Combinations included DWI as anatomical data, ADC 0-100 as perfusion-sensitive ADC, ADC 100-700 as perfusion-insensitive ADC, and ADC 0-100-700 as conventional ADC. The survival network could capture features related to the survival of the lung cancer patient from the input dataset through training. The output of the network was the survival probability of the lung cancer patients.

Implementation
The models were implemented using Tensorflow (version 1.14). The pretrained VGG16 model in ImageNet was used to obtain the initial parameters of our network. Our model was trained at the initial learning rate of 0.001 for one classification layer, two fully connected layers, and three convolutional layers until 70 epochs, and at the learning rate of 0.00001 for the fine tuning of the whole layers until 100 epochs. The total epoch was set to 170. Cross entropy was used for loss function, and the stochastic gradient descent was used as an optimizer. Data augmentation, such as flipping the x and y axis and rotation (−30~30), was performed during training. For the inputs of the model, the three channels of images were used as various combinations of the DWI (DWI 0 , DWI 100 , DWI 700 ) and the ADC map (ADC 0-100 , ADC 100-700 , ADC 0-100-700 ). Ten-fold cross validation was used to evaluate the survival prediction model. A total of 100 subjects were divided into 10 subsets containing 10 subjects for each subset. One subset (10 subjects) was used as the test set and nine subsets (90 subjects) were used as the training set. The accuracy of the model was reported as the average of the prediction accuracy from the 10 experiments.

Statistical Analysis
Several commonly reported performance metrics such as the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, kappa, accuracy, and balanced accuracy were used to evaluate whether survival at 5 years could be classified using deep learning models which were trained with different sets of input data as the predictor. Here, in the confusion matrix, Cohen's kappa is a measure of the proportion of a "true" agreement beyond that expended by chance, and the balanced accuracy was defined as the average of sensitivity and specificity to deal with the class imbalance problems [35].
To provide measurements of the uncertainty of the model's prediction accuracy, we calculated the 95% confidence intervals (CIs) for the estimation of measurements by providing bootstrap samples with 1000 replications. When the 95% CI for a given comparison did not include zero, we concluded that there was a difference between the two models. The association between MR images and the 5-year survival of lung cancer patients was tested via logistic regression analysis, adjusted for clinical information such as age, sex, smoking history, tumor size, pathologic type, and surgical stage. The optimal cutoff was calculated using Youden's index.
All statistical analyses were carried out using R packages (version 3.6.1; R Development Core Team, www.r-project.org, accessed on 13 May 2022) and SAS (version 9.4; SAS Institute, Cary, NC, USA). All statistical tests were two-sided with a significance level of 0.05.

Performance of the Survival Prediction Model Using DWI and ADC
The best predictive performance (92% accuracy) was achieved in the model learning from the combination of DWI 0 -ADC 0-100 -ADC 0-100-700 input data ( Table 2). The model trained with at least one ADC map showed high accuracies (87~92%). On the other hand, the models trained with only DWIs showed low accuracies (76% in DWI 0 -DWI 0 -DWI 0 and 81% in DWI 0 -DWI 100 -DWI 700 input data), although this model structure integrated features from each DWI's input data (Figure 2). When the accuracies were compared, the model trained using DWI 0 -ADC 0-100 -ADC 0-100-700 input data showed significantly better performances than the model trained with only DWIs, but there was no significant difference between the models using at least one ADC input datum ( Table 2).
Looking at the individual cases that the models accurately predicted, the model using both ADC and DWI accurately predicted 9 additional cases which were not predicted accurately by the model using only ADC, and 12 additional cases which were not predicted accurately by the model using only DWI. On the other hand, the model using both ADC and DWI did not make correct predictions in four cases which were correctly predicted using ADC only, and one case which was correctly predicted using DWI only. The three cases could not be predicted correctly by any of these three models (Figure 3). When the accuracies were compared, the model trained using DWI0-ADC0-100-ADC0-100-700 input data showed significantly better performances than the model trained with only DWIs, but there was no significant difference between the models using at least one ADC input datum ( Table 2).
Looking at the individual cases that the models accurately predicted, the model using both ADC and DWI accurately predicted 9 additional cases which were not predicted accurately by the model using only ADC, and 12 additional cases which were not predicted accurately by the model using only DWI. On the other hand, the model using both ADC and DWI did not make correct predictions in four cases which were correctly predicted using ADC only, and one case which was correctly predicted using DWI only. The three cases could not be predicted correctly by any of these three models (Figure 3). ADC value was 1.12 × 10 −3 mm 2 /s, which could suggest poor prognosis, but he remained alive 5 years after curative surgery. Deep learning model with DWI-only combination (DWI 0 -DWI 0 -DWI 0 , DWI 0 -DWI 100 -DWI 700 ) failed to predict the survival, but the model with ADC input predicted his survival correctly.

Performance of the Survival Prediction Model Using DWI, ADC, and Clinical Information
When clinical information (age, sex, smoking history, tumor size, pathologic type, and surgical stage) was added to the AI-generated survival predictions using diffusion MRI, the survival prediction improved when demographic features were added to the model with only DWIs, but the benefit of clinical information was not prominent when added to the best performing model using both DWI and ADC ( Table 3). The best performance (94%) was achieved with a model using DWI 0 -ADC 0-100 -ADC 0-100-700 and all of the clinical information as input data, which was slightly better than the accuracy with a model using DWI 0 -ADC 0-100 -ADC 0-100-700 only (92%). However, when clinical information was added to the model using DWI only (76~81% accuracies), the survival prediction was improved with more than a 7% increase in accuracies (83~89% accuracies).

Discussion
DWI and ADC of MR images reveal the diffusion capacity of water molecules and are widely used for oncologic imaging in terms of characterization, diagnosis, and prognosis prediction. Either a visual assessment of diffusion restriction by comparing the signal intensity of high-and low-b-value DWI or measuring the value less than 1.5 × 10 −3 mm 2 /s on the ADC map may suggest poor prognosis of a patient. For example, based on these two assessments, radiologists could suggest a diagnosis of malignancy based on MR images, although the ADC range of lung cancer can vary [12,36]. Intense restriction on DWI and smaller ADC values can suggest poor prognosis in terms of higher pathologic grade, lymph node metastasis, and response to chemoradiation therapy, but there are no obvious criteria nor cutoff values for the differentiation of survival and death in each NSCLC patient [37,38]. Such identifications of diffusion restriction can help to predict the probability of better or poorer prognosis, but individual (personalized) prediction of 5-year death or survival for a specific patient cannot be achieved via visual assessment or value measurement only.
The prognostic prediction of NSCLC patients using deep learning models has been applied with several biomarkers such as radiologic, histopathologic, genetic, or molecular evidence [39][40][41][42]. In the medical field of pulmonary image analysis and prognosis prediction, several deep learning applications have been suggested in terms of chest radiograph, CT, or PET/CT. Lu et al. demonstrated that deep learning chest radiograph risk scoring could stratify the mortality risk of individuals of the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial and National Lung Screening Trial [43]. Hosny et al. demonstrated a mortality assessment on the CT images of NSCLC patients via deep learning [44]. Also, Baek et al. visualized the U-Net algorithm of PET/CT in NSCLC patients in a prediction of survival. From our knowledge, this is the first study demonstrating the 5-year overall survival prognostication of NSCLC patients after curative surgery based on a deep learning model from DWI and ADC data of the tumor. The accuracy of this prediction was 92%, the highest in our model learning input data of DWI and ADC.
Signal intensity loss on the diffusion-sensitive sequence can be quantified by calculating ADC [45]. Based on non-linear transformation of the voxel values of each DWI, the deep learning model could generate ADC-like feature maps. However, in our study, we found that the deep learning model produced low-accuracy results using DWI images only. This could be due to the lack of training samples for weight optimization. The deep learning model could learn more efficiently when clinically significant parameters such as ADC 0-100 and ADC 0-100-700 were precalculated and then provided as input data, rather than it directly learning from the diffusion images. These clinically significant parametric maps enhance the predictive power of the deep learning model in cases of limited training samples. Alternatively, deep learning models trained solely with DWI 0 , DWI 100 , and DWI 700 images seemed to make predictions based on the "black box" nature of deep learning models, whether or not the model had extracted clinically relevant ADC data from the DWIs. In the case that the models had not extracted ADC from DWI, both the accuracy and the reliability of the model declined significantly. Our solution to this problem was to directly provide ADC (alleged known functional parameters which reflect the cellular density) to the deep learning model, so that we could be assured that the deep learning model incorporated ADC in its decision making.
The survival prediction with a regression model incorporating clinical information and AI-generated predictions using diffusion MRI improved the accuracies of models using diffusion MRI. The benefit of the clinical information is prominent in the relatively low-performing deep learning model using DWIs only, but the gain was not prominent in the best-performing deep learning model using both DWIs and ADCs, which already showed high accuracies of 92%. It would be difficult to further increase this high accuracy with the limited amount of data in our current study.
Our study is limited due to the small number of datasets. To deal with this limitation, we applied three techniques for evaluating the survival prediction model. Firstly, ten-cross validation was performed. The cross validation technique could minimize the problem of overfitting that may occur with a small number of datasets [46,47]. In this study, the train and test datasets were divided into 9:1 and the validation was conducted crosswise 10 times to maximize the amount of data that could be learned out of 100 datasets. Secondly, transfer learning was adopted to handle possible problems such as over-fitting or a lack of datasets. In this study, the model was trained by reusing the parameters of the pre-trained VGG16, and the number of weights for optimization was reduced. Lastly, data augmentation was performed to train the network to avoid the overfitting problem. The data augmentation technique is a well-known approach in the generalization of a deep learning model. In this study, flip and rotation functions were used in data augmentation, and the same is detailed in the Methods section.
For the interpretation of the deep learning model, previous studies have shown promising results [48][49][50]. The class activation map (CAM), for example, provides the location information of contributing pixels within the images, allowing the CNN to predict the class of an image [50]. Using the CAM, we could understand which parts of the image had more of an effect on the final output of the deep learning model. In this study, however, we could not apply the CAM into our modified VGG 16 model, due to the limited deep learning architecture and transfer learning strategies.

Conclusions
In conclusion, deep learning may play a role in the survival prediction of lung cancer. The accuracy of results produced by the deep learning model can be enhanced by inputting precedented, proven, functional parameters of the ADC, including the raw data of DWI in survival prediction. The novelty of this paper lies not only in creating a new deep learning model, but also in our use of diffusion MRI data to predict survival in non-small-cell lung cancer patients-a clinical application that has not been attempted before in lung cancer survival prediction research.

Institutional Review Board Statement:
The institutional review board of our institution approved this study (approval code: NCT01065415; approval date: 2010-02) as part of a clinical trial for the staging of lung cancer, which was registered as a randomized clinical trial with ClinicalTrials.gov number NCT01065415.
Informed Consent Statement: Written informed consent was obtained from all of the patients in the single tertiary referral hospital. Data Availability Statement: Not applicable.