Radiogenomic System for Non-Invasive Identification of Multiple Actionable Mutations and PD-L1 Expression in Non-Small Cell Lung Cancer Based on CT Images

Simple Summary Actional mutations and PD-L1 expression are of paramount importance for the precision treatment of lung cancer. Radiogenomics is a promising field that integrated radiologic images and genomic data through artificial intelligence technology. This approach enables non-invasive assessment of genes, but the vast majority of studies are limited to single gene mutation prediction. Our study aimed to propose a multi-label multi-task deep learning (MMDL) system to predict molecular status based on routinely acquired computed tomography (CT) images using deep learning and radiomics. A dataset of CT images from 1096 non-small cell lung cancer (NSCLC) patients with molecular tests was curated to train, validate and test. The MMDL model achieved superior performance on the classification task of simultaneous identification of eight genes or even ten molecules. This system has the potential to be an auxiliary support tool to advance precision oncology. Abstract Purpose: Personalized treatments such as targeted therapy and immunotherapy have revolutionized the predominantly therapeutic paradigm for non-small cell lung cancer (NSCLC). However, these treatment decisions require the determination of targetable genomic and molecular alterations through invasive genetic or immunohistochemistry (IHC) tests. Numerous previous studies have demonstrated that artificial intelligence can accurately predict the single-gene status of tumors based on radiologic imaging, but few studies have achieved the simultaneous evaluation of multiple genes to reflect more realistic clinical scenarios. Methods: We proposed a multi-label multi-task deep learning (MMDL) system for non-invasively predicting actionable NSCLC mutations and PD-L1 expression utilizing routinely acquired computed tomography (CT) images. This radiogenomic system integrated transformer-based deep learning features and radiomic features of CT volumes from 1096 NSCLC patients based on next-generation sequencing (NGS) and IHC tests. Results: For each task cohort, we randomly split the corresponding dataset into training (80%), validation (10%), and testing (10%) subsets. The area under the receiver operating characteristic curves (AUCs) of the MMDL system achieved 0.862 (95% confidence interval (CI), 0.758–0.969) for discrimination of a panel of 8 mutated genes, including EGFR, ALK, ERBB2, BRAF, MET, ROS1, RET and KRAS, 0.856 (95% CI, 0.663–0.948) for identification of a 10-molecular status panel (previous 8 genes plus TP53 and PD-L1); and 0.868 (95% CI, 0.641–0.972) for classifying EGFR / PD-L1 subtype, respectively. Conclusions: To the best of our knowledge, this study is the first deep learning system to simultaneously analyze 10 molecular expressions, which might be utilized as an assistive tool in conjunction with or in lieu of ancillary testing to support precision treatment options.


Study Population
The data for all NSCLC patients who visited West China Hospital of Sichuan University from April 2018 to June 2020 were collected in this study ( Figure 2). Complete anonymization of data was performed before inclusion. Patients who met the following inclusion criteria were enrolled in this study: (1) histologically diagnosed with NSCLC; (2) had molecular tests including Amplification Refractory Mutation System-Polymerase Chain Reaction (ARMS-PCR) or NGS to confirm the status of EGFR, ALK, ERBB2, BRAF, MET, ROS1, RET, KRAS (8-panel), and TP53; PD-L1 expression status was detected using the SP142 antibody in IHC assays performed on the Ventana Benchmark platform; and (3) had a preoperative CT examination performed within 1 month before diagnosis.
Patients were excluded from the study based on the following criteria: (1) low-quality CT images with image artifacts (due to metal objects) or motion artifacts (including breathing); (2) indistinguishable tumor contour that was unsuitable for CT segmentation due to nearby obstructive pneumonia and atelectasis; and (3) preoperative treatment had been received. Finally, on the basis of the aforementioned criteria, 1096 patients were identified with a diagnosis of NSCLC and definite multiple molecular expression status (positive and negative type); 932 patients were chosen to form the 8-panel cohort; 637 patients were chosen to form a 10-panel cohort (8 genes plus TP53 and PD-L1) for further prediction, and 206 patients were collected for subtype prediction.

Imaging Acquisition and Preprocessing
We retrieved DICOM files of the CT scan from the Picture Archiving and Communication System (PACS). All scans had a reconstructed slice thickness ranging from 1 mm to 5 mm, a voltage of 120 kV, a current of 200-350 mA, and a matrix size of 512 × 512. CT scans typically store raw voxel intensities in Hounsfield units (HU), and the raw voxel value was normalized to 0 to 255 with a windowing based on the lung window.
To train the radiomics and deep learning models, we needed to acquire the delineation of the mutation-related nodules in advance. Given that manually segmenting the contours of chest abnormalities according to the original records might be time-consuming, it was vital to leverage automated contour extraction approaches to produce large-scale annotated molecular datasets. Automated AI segmentation models can be employed to automatically delineate all the lung nodules in a CT scan. However, sometimes there might be more than one nodule in a CT scan. In this instance, clinicians were needed to identify the targeted nodule manually. Therefore, the whole mask generation process required a two-phase procedure.
First, we adopted an off-the-shelf DenseNet as model backbone to automatically segment all nodule areas in the chest CT images [29]. The lesion segmentation model employed a feature pyramid block to form the U-shaped architecture, which is widely used to build segmentation models [30]. Then, radiologists with at least 5 years of expertise in thoracic diseases diagnosis could quickly review the final segmentation results and localize the targeted nodules according to previous inspection reports to form the final datasets.
As a further pre-processing step, nodule regions of interests (ROIs) and mask ROIs were first cropped based on the confirmed nodule mask and then normalized to a size of 64×64×64 using third-order spline interpolation for further analysis. The training set was then balanced using data augmentation techniques such as horizontal flipping, random rotation, random blurring, and reweighting.  Patients whose specimens underwent histological staining or were used for molecular testing (8-panel, 10-panel and subtype) were used to evaluate the performance of our models on binary classification of molecular status (positive and negative), prediction of multiple molecular alterations and classification of subtype. '+' indicates positive type while '-' means negative type. Patients whose specimens underwent histological staining or were used for molecular testing (8-panel, 10-panel and subtype) were used to evaluate the performance of our models on binary classification of molecular status (positive and negative), prediction of multiple molecular alterations and classification of subtype. '+' indicates positive type while '-' means negative type.

Radiomics Approach
To automatically extract radiomics features from CT scans, the radiomics approach employed an open-source Python package called Pyradiomics (version 3.0.1). To ensure data point validity, a total of 1052 radiomics features were extracted from each ROI; we considered only the 9 largest metastatic sites in each lesion, which are comprised of 19 first-order features, 16  Due to the high dimensionality of the radiomics feature space, we analyzed the similarity of each feature pair to eliminate irrelevant or highly correlated features to improve the generalization ability and optimize the model. As a result, we started by removing features with a training-set variance less than 0.8. Next, we standardized all of the radiomics features by scaling each feature to a certain range in order to keep features with a 2-norm value. The K-best feature selection method was then used for the normalized radiomics features, and the remaining features were applied to the least absolute shrinkage and selection operator (LASSO) penalized Cox proportional hazards regression method. The customized signature was then created by combining all critical features in a weighted linear fashion, and the personalized signature score was calculated for each lesion.

Convolutional Neural Network-Based Deep Learning
The convolutional neural network (CNN)-based design relied on the ResNet-3D as a backbone with a lesion mask-guided attention mechanism to focus on lesion regions, enhancing lesion response while suppressing others [31]. We applied the lesion maskguided attention to mine the lesion-mask enhanced feature and pay more attention to the interaction between lesions and surrounding tissues, thereby increasing the model's representation capacity ( Figure 3A, mask-guided attention). First, the standardized lesion-ROI and associated mask were separated into two images, which were then fed into the convolutional layer to obtain deep features of the lesion and surrounding tissues. Second, the similarity between the lesion and tissue pixels was determined, and the similarity was then normalized to obtain the weight of each point, which was then multiplied by the features of the corresponding point mapping. This method took into account the detailed information of the focus region, its distribution position in the whole image, and the concurrent reliance of other neighboring areas. The greater the similarity degree and effect on the point, the more other points are connected to this point. This mask-guide mechanism was employed at the beginning of the backbone. Then, several identity blocks were used to allow information to flow more smoothly from one layer to the next layer ( Figure 3A, identity block). Finally, global average pooling was used to replace the model's top layers, after which a fully connected layer of 512 nodes (dimensions of the deep learning features) with rectified linear unit (ReLU) activation functions and a fully completely connected layer with 8 or 10 nodes (8-panel and 10-panel) were created with the Softmax activation function.
after which a fully connected layer of 512 nodes (dimensions of the deep learning features) with rectified linear unit (ReLU) activation functions and a fully completely connected layer with 8 or 10 nodes (8-panel and 10-panel) were created with the Softmax activation function.

Figure 3.
Overall framework of the proposed CNN-based and transformer-based network. Both proposed networks took two patches as inputs: the standardized lesion-ROI and associated mask-ROI combined two-channel volume. Before the first stage of each backbone, a mask-guide mechanism was employed to boost the model's representation capacity. (A) The ResNet-3D network was developed by applying a ResNet3D-18 feature extractor to each 3D volume and employing multiple binary cross-entropy loss functions. (B) The Transformer network relied on the 3D-Swin-transformer as the backbone. The 3D-Swin-transformer merged image patches to build hierarchical feature maps. Two Successive Swin Transformer Blocks performed cyclic shift of local windows for shifted-window-based self-attention computation and the multi-head self-attention Module computes self-attention within each local 3D window.

Transformer-Based Deep Learning
The main architecture of the 3D-Swin-transformer comprised four stages, with each level reducing the resolution of the input feature map and expanding the receptive field layer by layer, similar to the CNN. The model mainly consisted of three components (Fig-Figure 3. Overall framework of the proposed CNN-based and transformer-based network. Both proposed networks took two patches as inputs: the standardized lesion-ROI and associated mask-ROI combined two-channel volume. Before the first stage of each backbone, a mask-guide mechanism was employed to boost the model's representation capacity. (A) The ResNet-3D network was developed by applying a ResNet3D-18 feature extractor to each 3D volume and employing multiple binary cross-entropy loss functions. (B) The Transformer network relied on the 3D-Swin-transformer as the backbone. The 3D-Swin-transformer merged image patches to build hierarchical feature maps. Two Successive Swin Transformer Blocks performed cyclic shift of local windows for shifted-windowbased self-attention computation and the multi-head self-attention Module computes self-attention within each local 3D window.

Transformer-Based Deep Learning
The main architecture of the 3D-Swin-transformer comprised four stages, with each level reducing the resolution of the input feature map and expanding the receptive field layer by layer, similar to the CNN. The model mainly consisted of three components ( Figure 3B): (1) patch embedding: for the input 3D ROI, linear embedding changes the dimensions of the input vector to preset tokens that can be processed by the Transformer; (2) patch merging: the function of this module was to perform down-sampling before the start of each stage, which was used to reduce the resolution and adjust the number of channels to form a hierarchical design; the 3D-Swin-transformer could generate hierarchical feature maps at various resolutions by patch merging layers, making it suitable as a generalpurpose backbone for pixel-level operations; and (3) window attention: this calculated the relationships between each patch in an ROI and all the patches in the ROI, and to a certain extent, the relationship between these patches reflected the relevance and importance of the different patches in the ROI. The attentional mechanism, of which the self-attention was the core of the coding unit, and the most significant component of the proposed transformerbased paradigm at each transformer block. The window-based multi-head self-attention and shifted-window-based multi-head self-attention were successively applied in each Swin-transformer block to further extract global interactions between adjacent window patches ( Figure 3B, two successive Swin transformer blocks). Notably, to emphasize the nodule areas when extracting the deep features, we also adopted the two-channel input for the transformer model to achieve lesion mask-guided feature extraction.

Multi-Label Multi-Task Deep Learning (MMDL) System
For each task, the multi-label multi-task deep learning (MMDL) system achieved multi-label prediction by using multiple binary classifiers to analyze whether patients have positive expression of molecules such as EGFR, ALK, ERBB2, etc. In order to achieve the multiple tasks, the model employed full connectivity to allow for self-adaptation based on a combination of deep learning and radiomics features ( Figure 1D). Due to these related features originate in various dimension spaces, how to integrate these features to develop joint models was currently not addressed. In this study, we first used different approaches (radiomics-based and transformer-based) to extract the radiomics features and deep-learning features. In order to investigate a better feature fusion, we computed the correlation heatmap of radiomics and deep learning features to visualize the relative contributions of features on molecular status prediction. As a result, the reference distribution was the correlation distribution across all features from the patient case. The XGboost approach was then used to select the key features identified as the most significant contributors, which were followed by a new buffer layer, to an embedding feature space, making superior use of the individual feature strength [32]. Finally, various features were embedded to complete multiple tasks such as gene mutation analysis, molecular expression analysis, and subtype identification.

Statistical Analysis
The performances of the models were assessed according to AUC, specificity, and sensitivity with a 95% CI. To generate averages and standard deviations for each set of cross-validation trials, performance indicators were averaged over k folds. Moreover, the cut-point was determined by maximizing the sum of sensitivity and specificity, which was similar to selecting a point on the receiver operating characteristic curve (ROC). For each genetic mutation, DeLong's test was used to assess the diagnostic performance of the radiomics model, deep learning model, and combined model. All of the statistical tests were two-sided, with p < 0.05 denoting statistical significance.

Patient Characteristics
We established a Cancer Shared Database (CSD), covering radiology images and molecular information of 1096 patients diagnosed with NSCLC at West China Hospital of Sichuan University ( Figure 1A). The CSD was divided into four cohorts according to the different prediction tasks ( Figure 1B In the image preprocessing process, the DenseNet model with feature pyramid networks was utilized to automatically segment all lesion areas in chest CT images ( Figure 1C). Then radiologists with at least 5 years of expertise in thoracic tumor diagnosis quickly reviewed the segmentation results to form the training and validation datasets. Afterward, the cropped nodule ROIs were standardized to the same size for model construction. Furthermore, we explored various AI approaches and an improved combination of methods to evaluate the association between the features extracted using a standard radiomics pipeline and those extracted using the deep learning pipeline (CNN-based or transformer-based). Finally, the MMDL system which integrated 512 deep learning features and 20 highestperforming radiomics features was established to achieve the simultaneous prediction of multiple molecular statuses ( Figure 1D). For each task cohort, we randomly split the dataset into training (80%), validation (10%), and testing (10%) subsets.

The Performance of the Deep Learning Models
The performance of deep learning models was generally superior to those of the radiomics model, regardless of whether the CNN-based model or transformer-based model was used ( Table 2). The transformer-based design relied on the 3D-Swin-transformer as the backbone along with a lesion mask-guided feature extraction scheme [33]. The model first employed the patch partition operation to separate the standardized lesion-ROI and related mask combined two-channel volume into tiny patches, which were then fed into the shifted window transformer block, to model the long-range decencies among and within those tiny patches. The quantitative performance was shown in Table 2 and the detailed diagnostic measures of all CNN-based and transformer-based models were shown in Figures S2 and S3, indicating that the transformer-based model gained better performance than the CNN-based model with a significant difference (AUC = 0.847, 95% CI, 0.763-0.942 versus AUC = 0.825, 95% CI, 0.682-0.891 in the target molecular testing cohort, p < 0.0001). A similar improvement over the CNN-based model was also observed in the other cohorts, supporting the idea that the transformer-based features can be selected for deep learning prediction models that can automatically extract better deep features of the ROI to predict the molecular expression.

Correlation Analysis between Radiomics and Deep Learning Features
To further illustrate the association between deep learning and radiomics features in predicting multiple molecular alterations and mutation status, we utilized a variety of methodologies to develop a better fusion expression of tumor characteristics. Within each feature bank, we employed 20 radiomics and 512 transformer-based features to produce a heatmap that depicted the correlation between the two feature sets. In Figure S4A, each dot represented a correlation coefficient, and the red color meant a coefficient of zero, whereas the white and black dots reflect positive and negative correlations, respectively. The heatmap demonstrated a strong linear relationship between important features (radiomics vs. deep learning features). Figure S4B depicted the predictive performance of radiomics and deep learning features for positive and negative patients, with practically all CT volume characteristics demonstrating a strong ability to differentiate between the two groups.
However, because these related features originated in separate feature dimension spaces, the usual feature selection approach could not be simply applied. As a result, we applied the SHapley Additive exPlanation (SHAP)-based XGboost method to complete the multivariable logistic regression and calculate the influence of a given variable on a given feature in contrast to the prediction ( Figure 5A) [34]. Furthermore, 19 deep learning and 5 radiomics key features were significantly associated with more than one molecule, indicating the potential of fusion features in predicting molecular co-alteration status. ( Figure 5B).

Discussion
Radiogenomic approaches aggregate radiology and genomics data based on the hypothesis that radiomic features reflect macroscopic and molecular properties of tissues. Such tests in routine imaging could offer the ability to capture features from a full 3D volume of the tumor, avoiding sampling errors due to intra-tumor heterogeneity. In this research, a hybrid deep learning model named MMDL was developed to evaluate actionable mutations and PD-L1 expression non-invasively based on CT images of 1096 patients with lung cancer. This approach combined 512 transformer-based deep learning features and approximately 20 radiomic features to predict 10 molecular states with AUC performance above 0.799 ( Figure 4C).
To the best of our knowledge, this is the first study to predict mutations in 8 actionable genes or even 10 molecules based on CT images. The predictive performance of the MMDL model was excellent in identifying distant molecular and subtype status, which has potential for clinical application. It could aid in the assessment of patients' molecular status non-invasively and assist clinicians in making diagnosis and treatment decisions. However, the predictive performance varied among different molecules. For example, the best AUC of the MMDL model in the 8-panel task was 0.903 (95% CI, 0.786-1.000) for ALK and the worst AUC was 0.793 (95% CI, 0.686-0.936) for BRAF. This may have been related to different gene frequencies and training sample sizes (82 patients with ALK mutation; 34 patients with BRAF mutation). The same situation occurred in the prediction of genetic mutations using pathological images. Some investigators developed a CNN model based on Inception v3 architecture for the automatic analysis of tumor slides using publicly available whole-slide images available in the TCGA [35]. They found that six mutated genes which were mutated in at least 10% of the available tumors, including STK11, EGFR, FAT1, SETBP1, KRAS, and TP53, could be predicted from pathology images with AUCs ranging from 0.733 to 0.856, but their model was not able to detect genes with lower incidence, such as ALK. This suggested the urgent need for a publicly shared database to enable construction of more accurate artificial intelligence models.
In order to efficiently extract image features for molecular prediction, we established deep learning models based on transformer and CNN architectures, respectively. Transformer is a novel deep learning network that avoids recurrence and completely relies on the attention mechanism to model the global dependencies of the input and output. This model breaks through the limitation that the recurrent neural network (RNN) model cannot be calculated in parallel, and the number of operations required to calculate the association between two locations does not increase with distance compared to CNN. Furthermore, the MMDL model integrated radiomics and deep learning features to achieve remarkable prediction results. Different modalities of medical data provide patient diagnosis and treatment information from a specific perspective. The characteristics of clinical multimodal data provide the basis for the realization of accurate disease diagnosis [36,37]. Some researchers conducted Tumor Origin Assessment via Deep Learning (TOAD) to predict the origins of 18 common tumor primary/metastases and unknown primary cancer origins based on a multiclass, multitask, multiple-instance learning architecture. Compared with that of the single-modal single-task model, the performance of the fusion model was 2.0% higher in primary tumor prediction, and 6.8% higher in tumor metastasis prediction, and the overall accuracy rate reached 83.4% [38]. Multimodal data fusion is a future trend in the development of medical diagnosis and treatment methods.
There were some limitations in this study. First, all data came from a single medical center, so the generalization of the model requires multicenter data for verification. Second, the deep learning process was invisible and lack of interpretability. We have explored feature correlations, but there was still a certain distance to clinical practice. Third, this study focused on multiple molecular statuses and lacked patient efficacy and prognostic assessments. Previous studies have confirmed that deep learning features related to molecular status can be used to evaluate efficacy in patients [39], and we will conduct more in-depth and detailed research in the future.

Conclusions
The MMDL system was established and validated to achieve excellent predictive performance for 10 molecular alterations and specific subtypes in NSCLC. Radiogenomic model was a combination of routine clinical radiological scans and artificial intelligence to detect molecular status non-invasively. It was the potential decision-support tool to assist physicians in cancer treatment management.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers14194823/s1, Figure S1: The radiomics model performance in the prediction of multiple molecular alterations; Figure S2: The CNN-based deep learning model performance in the prediction of multiple molecular alterations; Figure S3: The transformer-based deep learning model performance in the prediction of multiple molecular alterations; Figure S4: Association between radiomics and deep learning features; Table S1: The clinical characteristic of cancer shared dataset.  Informed Consent Statement: Because this study was retrospective, the requirement for patient informed consent was waived.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.