Deep-Learning-Based Predictive Imaging Biomarker Model for EGFR Mutation Status in Non-Small Cell Lung Cancer from CT Imaging

Simple Summary Deep-learning-based radiogenomic (DLR) models show promising performance in assisting with lung cancer care. The primary aim of our study was to develop and validate a DLR model to predict EGFR mutation status in non-small-cell lung cancer (NSCLC) patients. Using 990 patients from two clinical trials, the study employed a machine learning pipeline that analysed CT images with manually selected tumour regions. Two deep convolutional neural networks segmented lung masses and nodules from 3D regions of the CT image. The combined radiomics and DLR model achieved 88% accuracy in predicting EGFR mutations, outperforming individual models. The semantic features extracted from CT images also contributed to accurate predictions. The study suggests that this AI-based model in combination with CT semantic features could serve as a non-invasive biomarker that aids in predicting EGFR mutation status with significant accuracy. Abstract Purpose: The authors aimed to develop and validate deep-learning-based radiogenomic (DLR) models and radiomic signatures to predict the EGFR mutation in patients with NSCLC, and to assess the semantic and clinical features that can contribute to detecting EGFR mutations. Methods: Using 990 patients from two NSCLC trials, we employed an end-to-end pipeline analyzing CT images without precise segmentation. Two 3D convolutional neural networks segmented lung masses and nodules. Results: The combined radiomics and DLR model achieved an AUC of 0.88 ± 0.03 in predicting EGFR mutation status, outperforming individual models. Semantic features further improved the model’s accuracy, with an AUC of 0.88 ± 0.05. CT semantic features that were found to be significantly associated with EGFR mutations were pure solid tumours with no associated ground glass component (p < 0.03), the absence of peripheral emphysema (p < 0.03), the presence of pleural retraction (p = 0.004), the presence of fissure attachment (p = 0.001), the presence of metastatic nodules in both the tumour-containing lobe (p = 0.001) and the non-tumour-containing lobe (p = 0.001), the presence of ipsilateral pleural effusion (p = 0.04), and average enhancement of the tumour mass above 54 HU (p < 0.001). Conclusions: This AI-based radiomics and DLR model demonstrated high accuracy in predicting EGFR mutation, serving as a non-invasive and user-friendly imaging biomarker for EGFR mutation status prediction.

Simple Summary: Deep-learning-based radiogenomic (DLR) models show promising performance in assisting with lung cancer care.The primary aim of our study was to develop and validate a DLR model to predict EGFR mutation status in non-small-cell lung cancer (NSCLC) patients.Using 990 patients from two clinical trials, the study employed a machine learning pipeline that analysed CT images with manually selected tumour regions.Two deep convolutional neural networks segmented lung masses and nodules from 3D regions of the CT image.The combined radiomics and DLR model achieved 88% accuracy in predicting EGFR mutations, outperforming individual models.The semantic features extracted from CT images also contributed to accurate predictions.The study suggests that this AI-based model in combination with CT semantic features could serve as a non-invasive biomarker that aids in predicting EGFR mutation status with significant accuracy.
Abstract: Purpose: The authors aimed to develop and validate deep-learning-based radiogenomic (DLR) models and radiomic signatures to predict the EGFR mutation in patients with NSCLC, and to assess the semantic and clinical features that can contribute to detecting EGFR mutations.Methods: Using 990 patients from two NSCLC trials, we employed an end-to-end pipeline analyzing CT images without precise segmentation.Two 3D convolutional neural networks segmented lung masses and nodules.Results: The combined radiomics and DLR model achieved an AUC of 0.88 ± 0.03 in predicting EGFR mutation status, outperforming individual models.Semantic features further improved the model's accuracy, with an AUC of 0.88 ± 0.05.CT semantic features that were found to be significantly associated with EGFR mutations were pure solid tumours with no associated ground glass component (p < 0.03), the absence of peripheral emphysema (p < 0.03), the presence of pleural retraction (p = 0.004), the presence of fissure attachment (p = 0.001), the presence of metastatic nodules in both the tumour-containing lobe (p = 0.001) and the non-tumour-containing lobe (p = 0.001), the presence of ipsilateral pleural effusion (p = 0.04), and average enhancement of the tumour mass above 54 HU (p < 0.001).Conclusions: This AI-based radiomics and DLR model demonstrated high accuracy in predicting EGFR mutation, serving as a non-invasive and user-friendly imaging biomarker for EGFR mutation status prediction.

Introduction
Non-small-cell lung cancer (NSCLC) accounts for the majority (85%) of all lung cancer cases.The two most common histopathologic subtypes are adenocarcinoma and squamous cell carcinoma [1].In the modern era of personalized and precision medicine, the mutational testing of selected genes for NSCLC remains a standard practice to categorize patients into responders and non-responders.This includes testing for mutations of the epidermal growth factor receptor (EGFR), a cell surface receptor activating cell growth and survival, which when mutated confers sensitivity to tyrosine kinase inhibitors.Some common clinical characteristics seen in patients with EGFR mutations are non-smoking status, adenocarcinoma histology, female sex, and East Asian ethnicity [2,3].
A lung mass Trucut biopsy is a must for histological and mutational analysis seeking to further develop a plan of treatment.However, it is not always feasible to obtain adequate tissue samples from a biopsy for mutational analysis, or there might be errors in targeting the lung mass.Some high-risk patients might not be fit to undergo a Trucut biopsy due to a deranged coagulation profile or other underlying morbidities.There is always a small risk of life-threatening complications associated with biopsies such as pneumothorax, haemoptysis due to alveolar haemorrhage, haemothorax, and hemopericardium.Also, in developing countries like India, advanced laboratory facilities for genomic mutation studies are not widely available, especially in small towns and rural areas.In this situation, clinical parameters such as Asian ethnicity, female sex, non-smoking status, and adenocarcinoma histology have been considered as potential prerequisites for the presence of EGFR mutation [4,5].However, these clinical characteristics represent only a selected small population with higher probability of harbouring the EGFR mutation.The tumour cells harvested via a core biopsy represent only a tiny fraction of the tumour, and might not represent the complex heterogeneity of the tumour mass.A study conducted by Taniguchi et al. [6] analysed 50-60 areas of tumour tissue in 21 patients with a known EGFR mutation.Intra-tumoural heterogeneity was seen in 28.6% of the study cohort (6 out of 21 tumours), which contained both EGFR-mutated and wild-type cells.Thus, more detailed factors are needed to analyse EGFR mutation statuses, such as the characterisation and analysis of quantitative computed tomography (CT) features.Every patient with a lung mass needs to undergo a CT scan; hence, pre-treatment CT images can prove to be a rich source of data for analysis.They can provide additional data for genomics and can potentially identify tumours with EGFR mutations [7,8].The resampling of the tumour can be considered if there are discrepancies between the mutation results from the biopsy and CT findings (based on deep learning, radiomics, and semantic markers); these combined analyses can potentially reduce the chances of missing EGFR mutations in a tumour mass.
Medical imaging is intuitively very suitable as a biomarker source, especially in lung cancer patients, where it is used to visualize tumour phenotypes and predict treatment response.There has been an increase in research on the characterisation of quantitative imaging features reflecting tumour biology, physiology, and phenotype using artificial intelligence (AI)-based algorithms.Radiomics and deep-learning (DL)-AI-based models are extensively used with medical imaging [9][10][11].Radiomics refers to the computerized extraction of data from radiologic images, and provides unique potential for making lung cancer screening more rapid and accurate by using machine learning algorithms.For analysing the tumour area, radiomics models require the precise annotation of the tumour boundary, which requires manually marking the tumour on all three planes [12,13].Since only the tumour area is taken into consideration, the microenvironment of the surrounding lung parenchyma is ignored.Advanced AI models such as neural network-based DL methods can overcome these limitations through a self-learning strategy, and present a promising tool for genomic analysis [14][15][16][17].The DL method can be likened to the functioning of the neural network in the brain.In comparison to radiomic methods, precise tumour boundary annotation is not required with deep learning, thus saving a lot of time and human effort.Furthermore, the DL method takes into consideration the microenvironment of the surrounding lung parenchyma, and can extract features that are adaptive to specific clinical outcomes, whereas radiomics can only describe general features that lack specificity for outcome prediction [18][19][20].Moreover, with the help of the DL model, the sub areas within the tumour that are strongly related to EGFR mutation status can be identified and further subjected to biopsy if required.Thus, both the methods can directly or indirectly help clinicians make rapid treatment decisions for patients.
The primary purpose of this study is to develop radiomics and DL models, which can mine data from CT images to predict EGFR mutation status using a large cohort of patients with NSCLC.Our DL method is an end-to-end pipeline that requires only the manual marking of the tumour region in a CT image without precise annotation [21].We also identified specific CT-based semantic features that correlate strongly with the presence of positive EGFR mutation in our study population.

Materials and Methods
The study was approved by the Institutional Ethics Committee and a waiver for consent was obtained in view of the retrospective nature of the study.The inclusion criteria for enrolling patients into the study were (1) primary lung adenocarcinoma confirmed on histopathology report; (2) the presence of proven records of EGFR mutation status; and (3) pre-operative/baseline contrast-enhanced CT data available.Exclusion criteria were (1) incomplete medical records or non-availability of digital DICOM CT images; (2) patients who have received chemotherapy or radiotherapy outside our institute before the baseline scan; (3) any other active illness or pathological condition that might interfere with the study data as per medical records.Finally, 990 patients were included in the study with patient cases evaluated from 2010 January to 2016 December.These patients were accrued from two clinical trials, which evaluated the role of Gefitinib vs. Pemetrexed and Carboplatin in the treatment of EGFR-mutated NSCLC [22,23].
The tumour specimens were obtained using CT-guided Trucut biopsy, with the biopsy targeted enhanced solid components of the tumour.EGFR mutations were identified on four tyrosine kinase domains (exons 18-21), which are common mutations in lung cancer.The mutation status was determined using a TaqMan Probe-Based Endpoint Genotyping Mutation Analysis undertaken via Real-Time PCR on the LC 480 II platform.For identifying a tumour as an EGFR mutant, any one exon (exon 18-21) mutation should be present; otherwise, the tumour should be identified as EGFR wild-type.The focus of the study was predicting the EGFR mutation status.

Radiology Review
A clinical radiologist with 10 years of experience in thoracic imaging and another radiologist with 2 years of experience in general radiology retrospectively reviewed the CT scans.Both the radiologists were blinded to clinical and histologic findings.The imaging review was performed on reconstructed DICOM data using a volume viewer integrated within the PACS.The images were reviewed for lung, soft-tissue, and bone window with reformatting available in all three planes, i.e., axial, coronal and sagittal.In case of any disagreements between the radiologists as regards the CT findings, the majority class was used as the final CT feature.Mean values were used for continuous variables.A subset of 223 patients was selected for the extraction of pre-determined semantic features from CT.The clinical details of the same subset of patients were also collected.

Development of the DL Model
Using a convolutional neural network, DL aims to learn the abstract mapping between the raw data and the desired label.Our DL model for EGFR mutation classification is a linear support vector machine (SVM), which takes in 9 different feature vectors extracted from 6 DL models and uses them to classify the EGFR mutation status.The 6 DL models include 2 models trained to segment masses, 2 models trained for nodule texture classification, 1 model for nodule spiculation classification, and another model for nodule segmentation.The selection of ROI for the DL model and its illustration is shown in Figures 1 and 2. Table 1 shows the list of 9 feature vectors, the models used and their combinations.For each model, a patch was extracted around the largest mass in the study.The DL model was constructed using the following frameworks: Python 3.8, Pytorch 1.1.

Development of the DL Model
Using a convolutional neural network, DL aims to learn the abstract mapping between the raw data and the desired label.Our DL model for EGFR mutation classification is a linear support vector machine (SVM), which takes in 9 different feature vectors extracted from 6 DL models and uses them to classify the EGFR mutation status.The 6 DL models include 2 models trained to segment masses, 2 models trained for nodule texture classification, 1 model for nodule spiculation classification, and another model for nodule segmentation.The selection of ROI for the DL model and its illustration is shown in Figures 1 and 2. Table 1 shows the list of 9 feature vectors, the models used and their combinations.For each model, a patch was extracted around the largest mass in the study.The DL model was constructed using the following frameworks: Python 3.8, Pytorch 1.1.For all the segmentation models, we used a standard 3D U-Net architecture, and a 512-dimensional feature vector was taken from the bottom-most layer of the U-Net.Fig-

Development of the DL Model
Using a convolutional neural network, DL aims to learn the abstract mapping between the raw data and the desired label.Our DL model for EGFR mutation classification is a linear support vector machine (SVM), which takes in 9 different feature vectors extracted from 6 DL models and uses them to classify the EGFR mutation status.The 6 DL models include 2 models trained to segment masses, 2 models trained for nodule texture classification, 1 model for nodule spiculation classification, and another model for nodule segmentation.The selection of ROI for the DL model and its illustration is shown in Figures 1 and 2. Table 1 shows the list of 9 feature vectors, the models used and their combinations.For each model, a patch was extracted around the largest mass in the study.The DL model was constructed using the following frameworks: Python 3.8, Pytorch 1.1.Original image is at 0.5× zoom 512 For all the segmentation models, we used a standard 3D U-Net architecture, and a 512-dimensional feature vector was taken from the bottom-most layer of the U-Net.Fig-  For all the segmentation models, we used a standard 3D U-Net architecture, and a 512-dimensional feature vector was taken from the bottom-most layer of the U-Net.Figure 3 shows the architecture of a standard U-Net with 512 features at the bottom-most layer.For all the classification tasks, a standard 3D Wide ResNet was used.The feature vector was extracted from the layer before the AveragePool3D layer.Figure 4 shows the structure of a standard 3D Wide ResNet.Feature vector was extracted from the conv4 layer before the avg-pooling layer.
ure 3 shows the architecture of a standard U-Net with 512 features at the bottom-most layer.For all the classification tasks, a standard 3D Wide ResNet was used.The feature vector was extracted from the layer before the AveragePool3D layer.Figure 4 shows the structure of a standard 3D Wide ResNet.Feature vector was extracted from the conv4 layer before the avg-pooling layer.Both the mass segmentation networks were trained using the above 990 patient data.All the masses were annotated by a technologist and were reviewed by an experienced radiologist.For nodule segmentation, texture classification and spiculation classification networks, 1010 studies from the publicly available dataset LIDC-IDRI were used for training.All these models were trained with a learning rate of 1 × 10 −4 and a weight decay of 1 × 10 −5 .Segmentation networks were trained with Negative-Log Likelihood (NLL) loss and classification networks were trained with Cross Entropy loss.Augmentations such as rotate, flip, scale and translate were used for all the models.Once all the 9 feature vectors had been extracted, we generated a combined feature vector of 4032 features.Of these 4032 features, 789 feature columns were zero, leaving a feature vector of size 3243.An SVM with linear kernel and balanced weights for the 2 classes was trained on the 3242 features of 990 patients with 3-fold cross-validation.Using the coefficients of the best model obtained, we removed all the feature columns with coefficients ≤ N. We iterated on various values for N and found N = 0.04 yielded the best model.After using N = 0.04 to remove the small coefficients, we had 1422 features left.We retrain the SVM with the new set of features to generate our best model.
On the subset with semantic features, the same experiments with the same set of 9 feature vectors were conducted with and without the semantic features.ure 3 shows the architecture of a standard U-Net with 512 features at the bottom-most layer.For all the classification tasks, a standard 3D Wide ResNet was used.The feature vector was extracted from the layer before the AveragePool3D layer.Figure 4 shows the structure of a standard 3D Wide ResNet.Feature vector was extracted from the conv4 layer before the avg-pooling layer.Both the mass segmentation networks were trained using the above 990 patient data.All the masses were annotated by a technologist and were reviewed by an experienced radiologist.For nodule segmentation, texture classification and spiculation classification networks, 1010 studies from the publicly available dataset LIDC-IDRI were used for training.All these models were trained with a learning rate of 1 × 10 −4 and a weight decay of 1 × 10 −5 .Segmentation networks were trained with Negative-Log Likelihood (NLL) loss and classification networks were trained with Cross Entropy loss.Augmentations such as rotate, flip, scale and translate were used for all the models.Once all the 9 feature vectors had been extracted, we generated a combined feature vector of 4032 features.Of these 4032 features, 789 feature columns were zero, leaving a feature vector of size 3243.An SVM with linear kernel and balanced weights for the 2 classes was trained on the 3242 features of 990 patients with 3-fold cross-validation.Using the coefficients of the best model obtained, we removed all the feature columns with coefficients ≤ N. We iterated on various values for N and found N = 0.04 yielded the best model.After using N = 0.04 to remove the small coefficients, we had 1422 features left.We retrain the SVM with the new set of features to generate our best model.
On the subset with semantic features, the same experiments with the same set of 9 feature vectors were conducted with and without the semantic features.Both the mass segmentation networks were trained using the above 990 patient data.All the masses were annotated by a technologist and were reviewed by an experienced radiologist.For nodule segmentation, texture classification and spiculation classification networks, 1010 studies from the publicly available dataset LIDC-IDRI were used for training.All these models were trained with a learning rate of 1 × 10 −4 and a weight decay of 1 × 10 −5 .Segmentation networks were trained with Negative-Log Likelihood (NLL) loss and classification networks were trained with Cross Entropy loss.Augmentations such as rotate, flip, scale and translate were used for all the models.Once all the 9 feature vectors had been extracted, we generated a combined feature vector of 4032 features.Of these 4032 features, 789 feature columns were zero, leaving a feature vector of size 3243.An SVM with linear kernel and balanced weights for the 2 classes was trained on the 3242 features of 990 patients with 3-fold cross-validation.Using the coefficients of the best model obtained, we removed all the feature columns with coefficients ≤ N. We iterated on various values for N and found N = 0.04 yielded the best model.After using N = 0.04 to remove the small coefficients, we had 1422 features left.We retrain the SVM with the new set of features to generate our best model.
On the subset with semantic features, the same experiments with the same set of 9 feature vectors were conducted with and without the semantic features.

Development of Radiomics Model
The primary tumour was segmented using the following techniques: manual, semiautomated and automated segmentation methods.The primary tumour on contrastenhanced CT was delineated manually using the post-processing software (AW 4.4) by a radiologist with ten years of experience in thoracic imaging.The tumour was first annotated in the mediastinal window (W 330 HU; L 50 HU) to include only the tumour area by identifying boundaries with the chest wall and other soft tissues, as shown in Figure 5, then in the lung window (W 1500 HU; L −600 HU) to delineate the maximum extent of the lung parenchyma.

Development of Radiomics Model
The primary tumour was segmented using the following techniques: manual, semi-automated and automated segmentation methods.The primary tumour on contrast-enhanced CT was delineated manually using the post-processing software (AW 4.4) by a radiologist with ten years of experience in thoracic imaging.The tumour was first annotated in the mediastinal window (W 330 HU; L 50 HU) to include only the tumour area by identifying boundaries with the chest wall and other soft tissues, as shown in Figure 5, then in the lung window (W 1500 HU; L −600 HU) to delineate the maximum extent of the lung parenchyma.The segmented image was then subjected to the extraction of radiomics features using pyradiomics [24].A total of 1110 radiomic features were calculated, divided into five groups: tumour intensity (n = 19), texture (n = 95), wavelet (n = 912), Laplacian of Gaussian (n = 74), and shape (n = 19).Emphasis was placed on the features from the previously published prognostic radiomic signatures: (I) tumour intensity-"Energy", (II) texture-"Gray Level Nonuniformity", (III) wavelet-"Gray Level Nonuniformity HLH", and (IV) shape-"Compactness".Our radiomics model for EGFR mutation classifier is a linear SVM, which takes in the 1110 radiomic features and predicts the presence of EGFR mutation.No feature columns with all zeros were identified.An SVM with linear kernel and balanced weights for the 2 classes was trained on the 1110 features of 990 patients with 3-fold cross-validation.Using the coefficients of the best model obtained, we removed all the feature columns with coefficients ≤ N. We iterated on various values for N and found that N = 0.1 yielded the best model.After using N = 0.1 to remove the small coefficients, we had 200 features left.We retrained the SVM with the new set of features to generate our best model.Table 2 shows top-performing radiomics features in The segmented image was then subjected to the extraction of radiomics features using pyradiomics [24].A total of 1110 radiomic features were calculated, divided into five groups: tumour intensity (n = 19), texture (n = 95), wavelet (n = 912), Laplacian of Gaussian (n = 74), and shape (n = 19).Emphasis was placed on the features from the previously published prognostic radiomic signatures: (I) tumour intensity-"Energy", (II) texture-"Gray Level Nonuniformity", (III) wavelet-"Gray Level Nonuniformity HLH", and (IV) shape-"Compactness".Our radiomics model for EGFR mutation classifier is a linear SVM, which takes in the 1110 radiomic features and predicts the presence of EGFR mutation.No feature columns with all zeros were identified.An SVM with linear kernel and balanced weights for the 2 classes was trained on the 1110 features of 990 patients with 3-fold cross-validation.Using the coefficients of the best model obtained, we removed all the feature columns with coefficients ≤ N. We iterated on various values for N and found that N = 0.1 yielded the best model.After using N = 0.1 to remove the small coefficients, we had 200 features left.We retrained the SVM with the new set of features to generate our best model.Table 2 shows top-performing radiomics features in predicting EGFR mutation.Figure 6 shows the pattern of radiomic workflow.Figure 7 shows the covariance matrix of radiomic features.

Combining DL and Radiomic Features
We combined the 4032 features from DL models and 1110 radiomic features and trained an SVM to predict the EGFR mutation.Similar to the above methods, the coefficients of the initial SVM were used to identify the best contributing features.For this model, N = 0.05 gave the best results.At N = 0.05, we were left with the top 2000 features, which were then used to retrain the SVM to generate out best model.predicting EGFR mutation.Figure 6 shows the pattern of radiomic workflow.Figure 7 shows the covariance matrix of radiomic features.predicting EGFR mutation.Figure 6 shows the pattern of radiomic workflow.Figure 7 shows the covariance matrix of radiomic features.

Statistical Analysis
Statistical analysis was performed using SPSS version 21 (IBM, Armonk, NY, USA).Data were descriptively analysed using frequency and percentage for categorical data.Interobserver agreement was determined by calculating Kappa values.A chi square test for independence was used to observe if any association could be seen between two variables.The Mann-Whitney U test was used to compare the medians between the two groups.Univariate Binomial logistic regression was used to determine the predictive factors for EGFR mutation.Multiple logistic regression analyses were performed to identify independent factors that can be used to predict EGFR mutation status.The final model was selected with the backward elimination method.Area under the curve (AUC) and Receiver Operating Characteristics (ROC) curves were used to present the accuracy of different predictive models.All statistics were 2-sided, and a value of p < 0.05 was considered statistically significant.

Patient's Characteristics
The relevant clinical data of a subset of 223 subjects are given in Table 3. Note-OR = odds ratio; CI-confidence interval.Data in parentheses [] are the percentage and parenthesis () are the range.ˆp value was based on a comparison between the EGFR mutation group and the wild-type group.Data in parentheses {} are 95% confidence intervals (CIs).

Correlation of EGFR Mutation Status with Clinical Features
The median ages of patients did not differ between the EGFR wild-type and EGFR mutant (p = 0.095) (Refer Table 1).EGFR mutation rates were significantly higher (a) in women than in men (p < 0.001) and (b) in non-smokers than in smokers (p < 0.001).Statistical analysis also revealed that Stage III disease was frequently seen with EGFR wild-type tumours (p < 0.008).

Correlation of EGFR Mutation Status with Semantic Features
Semantic features were extracted for 223 patients.The semantic features have been outlined in Table 4.Of 28 CT semantic features, univariate analysis (Table 4) revealed that the following CT features were significantly associated with harbouring EGFR mutation in NSCLC patients, including (a) pure solid tumours with no associated ground glass component (p < 0.03), (b) the absence of peripheral emphysema (p < 0.03), (c) the presence of pleural retraction (p = 0.004), (d) the presence of fissure attachment (p = 0.001), (e) the presence of a metastatic nodule in both the tumour-containing lobe (p = 0.001) and the nontumour-containing lobe (p = 0.001) (f) the presence of ipsilateral pleural effusion (p = 0.04) and (f) the average enhancement of the tumour mass above 54 HU (p < 0.001).

Radiomics Model Used in Predicting EGFR Mutation
The SVM model for EGFR mutation classification using radiomic features had an AUC of 0.72 ± 0.03 with three-fold cross-validation in 990 studies.Figure 8a shows the ROC curve of the model.

DL Model in Predicting EGFR Mutation
The SVM model for EGFR mutation classification using the nine feature vectors generated from DL models had an AUC of 0.82 ± 0.01 (CI: 0.81, 0.83) with three-fold cross-validation on 990 patients.Figure 8b shows the ROC curves for the three repetitions and their respective AUCs.On the subset of 223 cases with semantic features, the model had an AUC of 0.84 ± 0.02 (CI: 0.82, 0.86) without any semantic features, whereas the model with semantic features had an AUC of 0.88 ± 0.05 (CI: 0.83, 0.93).There was a 4% improvement in AUC following the addition of semantic features irrespective of the value of N used.Figure 8c,d show the ROC curves for the smaller subset of cases without and with semantic features added.

Radiomics Model Used in Predicting EGFR Mutation
The SVM model for EGFR mutation classification using radiomic features had an AUC of 0.72 ± 0.03 with three-fold cross-validation in 990 studies.Figure 8a shows the ROC curve of the model.

DL Model in Predicting EGFR Mutation
The SVM model for EGFR mutation classification using the nine feature vectors generated from DL models had an AUC of 0.82 ± 0.01 (CI: 0.81, 0.83) with three-fold cross-validation on 990 patients.Figure 8b shows the ROC curves for the three repetitions

Combining DL and Radiomic Features
The SVM model for EGFR mutation classification using the nine feature vectors generated from DL models, and 1110 radiomic features had an AUC of 0.88 ± 0.03 with three-fold cross-validation on 990 studies.Figure 9 shows the ROC curve of the model.and with semantic features added.

Combining DL and Radiomic Features
The SVM model for EGFR mutation classification using the nine feature vectors generated from DL models, and 1110 radiomic features had an AUC of 0.88 ± 0.03 with three-fold cross-validation on 990 studies.Figure 9 shows the ROC curve of the model.

Discussion
In this study, we assessed the role of AI-based radiomics and DL models using pre-treatment CT images of patients with lung adenocarcinoma to predict the EGFR mutation status.The DL model was trained using the CT images of 990 patients with three-fold cross-validation.The radiomics model showed good predictive performance with an AUC of 0.72 ± 0.03 with three-fold cross-validation on 990 studies.The SVM model generated from DL models had an AUC of 0.82 ± 0.01 with three-fold cross-validation on 990 studies.On a smaller subset of 223 cases for which semantic features were extracted, the DL model had an AUC of 0.84 ± 0.02, which improved to an AUC of 0.88 ± 0.05 when the semantic features were combined.There was a 4% improvement in AUC following the addition of semantic features.Cases with marked heterogeneity in the tumour, in cases with large tumour sizes and associated collapse or consolidation, showed the reduced accuracy of the DL and radiomics model.This resulted in increased error in the manual as well as automatic annotation of tumours, including the annotation of non-tumoural segment, and by extension, errors in feature ex-

Discussion
In this study, we assessed the role of AI-based radiomics and DL models using pretreatment CT images of patients with lung adenocarcinoma to predict the EGFR mutation status.The DL model was trained using the CT images of 990 patients with three-fold cross-validation.The radiomics model showed good predictive performance with an AUC of 0.72 ± 0.03 with three-fold cross-validation on 990 studies.The SVM model generated from DL models had an AUC of 0.82 ± 0.01 with three-fold cross-validation on 990 studies.On a smaller subset of 223 cases for which semantic features were extracted, the DL model had an AUC of 0.84 ± 0.02, which improved to an AUC of 0.88 ± 0.05 when the semantic features were combined.There was a 4% improvement in AUC following the addition of semantic features.Cases with marked heterogeneity in the tumour, in cases with large tumour sizes and associated collapse or consolidation, showed the reduced accuracy of the DL and radiomics model.This resulted in increased error in the manual as well as automatic annotation of tumours, including the annotation of non-tumoural segment, and by extension, errors in feature extraction and texture analysis.With increasing tumour size and heterogeneity, there is a loss in the internal characteristics of tumours related to specific genetic mutation, which results in the inaccurate training of the model and a reduction in predictive performance.
Other similar studies have shown a similar utility of DL models with improvements in the AUC when combined with clinical parameters [25][26][27][28].Further, a study on the PET/CT fusion algorithm using a dataset of 150 patients showed a prediction accuracy of EGFR and non-EGFR mutations of 86.25% in the training dataset and 81.92% in the validation set [29].
Clinical utility of the DL model: Our analysis provides an alternative effective method to assess EGFR mutation in patients with NSCLC without requiring any intervention.It can also act as an effective supplement to a biopsy.It can also help avoid complications associated with biopsies, and reduce false negative biopsy results due to tumour heterogeneity.In such cases, if the deep learning model shows a high probability of EGFR mutation, re-biopsy can be attempted.The DL method can further assist in selecting the target area for biopsy.Since the human assistance required is minimal, a large amount of data can be processed with minimal errors and time.The model is easy to use and apply at various levels of healthcare settings.The deep learning model only requires routinely used CT images, without adding any extra cost.Therefore, this model can be used multiple times throughout the course of treatment.
Limitations: The study was conducted on a population from a single tertiary healthcare centre.The model needs to be further trained and validated on large multicentric cohorts to increase the accuracy and robustness.In the current study, only EGFR mutation status was taken into consideration.The relationship between EGFR mutation and other genetic mutations (e.g., ROS-1, ALK) can be explored in future work, as has been explored in a few other preliminary studies [30].

Conclusions
Radiomics and deep learning models show promising results in the prediction of EGFR mutation status.The accuracy is further increased when CT semantic features are taken into consideration, along with the deep learning model.The application of both the models in clinical practice can be useful in predicting EGFR mutation status in a patient while the lung biopsy or genetic mutation test results are still being awaited.Further improvements in the sensitivity and specificity of both the models are expected with larger data sets.

Figure 1 .
Figure 1.Selection of ROI (red box) for DL model.

Figure 2 .
Figure 2. Illustration of the DL model.

Figure 1 .
Figure 1.Selection of ROI (red box) for DL model.

Figure 1 .
Figure 1.Selection of ROI (red box) for DL model.

Figure 2 .
Figure 2. Illustration of the DL model.

Figure 2 .
Figure 2. Illustration of the DL model.

Figure 4 .
Figure 4. Structure of a Wide Residual Network with width k.

Figure 4 .
Figure 4. Structure of a Wide Residual Network with width k.

Figure 4 .
Figure 4. Structure of a Wide Residual Network with width k.

Figure 5 .
Figure 5. Segmentation of the tumour (in red) done manually in all the three planes using multiplanar reconstruction for the extraction of radiomics features.

Figure 5 .
Figure 5. Segmentation of the tumour (in red) done manually in all the three planes using multiplanar reconstruction for the extraction of radiomics features.

Figure 6 .
Figure 6.Deep learning features and radiomics features workflow.

Figure 6 .
Figure 6.Deep learning features and radiomics features workflow.

Figure 6 .
Figure 6.Deep learning features and radiomics features workflow.

Figure 8 .
Figure 8.(a) ROC curves based on 990 studies with three-fold cross-validation of radiomic features.(b) ROC curves based on 990 studies with three-fold cross-validation of deep learning features.(c) ROC curves before adding the semantic features to the 223 cases.(d) ROC curve after the addition of semantic features to the 223 cases.

Figure 8 .
Figure 8.(a) ROC curves based on 990 studies with three-fold cross-validation of radiomic features.(b) ROC curves based on 990 studies with three-fold cross-validation of deep learning features.(c) ROC curves before adding the semantic features to the 223 cases.(d) ROC curve after the addition of semantic features to the 223 cases.

Figure 9 .
Figure 9. ROC curves from 990 studies with three-fold cross-validation on deep learning features.

Figure 9 .
Figure 9. ROC curves from 990 studies with three-fold cross-validation on deep learning features.

Table 1 .
List of 9 feature vectors, the models used and their combinations.

Table 1 .
List of 9 feature vectors, the models used and their combinations.

Table 1 .
List of 9 feature vectors, the models used and their combinations.

Table 3 .
Association between clinical features and EGFR mutation status.

Table 4 .
CT Features and EGFR mutation status.

Table 4 .
Cont.CI-confidence interval.Data in parentheses [] are the percentage and parenthesis () are the range.ˆp value was based on the comparison between the EGFR mutation group and the wild-type group.Data in parentheses {} are 95% confidence intervals (CIs).p-values with statistical significance are shown in bold.