Systematic Review on the Association of Radiomics with Tumor Biological Endpoints

Simple Summary In this systematic review, we aim to highlight existing literature devoted to the study of an association between medical imaging radiomics and cancer biological endpoints. The use of radiomics as an ancillary tool in cancer treatment would allow for a non-invasive, inexpensive, three-dimensional characterization of the tumor phenotype, contributing to the delivery of precision medicine. Nonetheless, its clinical application remains a challenge, as extensive, multi-center validation studies of radiomic features connection with tumor biology are required. In this review, we performed a search in PubMed database for peer-reviewed studies which evaluate the association between radiomic features and the following set of clinically relevant tumor markers: anaplastic lymphoma kinase (ALK), v-raf murine sarcoma viral oncogene homolog B1 (BRAF), epidermal growth factor (EGFR), human epidermal growth factor receptor 2 (HER-2), isocitrate dehydrogenase (IDH), antigen Ki-67, kirsten rat sarcoma viral oncogene homolog (KRAS), programmed cell death ligand 1 (PD-L1), tumor protein p53 (TP-53) and vascular endothelial growth factor (VEGF). Abstract Radiomics supposes an alternative non-invasive tumor characterization tool, which has experienced increased interest with the advent of more powerful computers and more sophisticated machine learning algorithms. Nonetheless, the incorporation of radiomics in cancer clinical-decision support systems still necessitates a thorough analysis of its relationship with tumor biology. Herein, we present a systematic review focusing on the clinical evidence of radiomics as a surrogate method for tumor molecular profile characterization. An extensive literature review was conducted in PubMed, including papers on radiomics and a selected set of clinically relevant and commonly used tumor molecular markers. We summarized our findings based on different cancer entities, additionally evaluating the effect of different modalities for the prediction of biomarkers at each tumor site. Results suggest the existence of an association between the studied biomarkers and radiomics from different modalities and different tumor sites, even though a larger number of multi-center studies are required to further validate the reported outcomes.


Introduction
Cancer precision medicine involves therapy adaptation to improve clinical outcome based on patient-specific characteristics as well as the tumor-specific molecular profile. The advent of high-throughput gene-sequencing techniques in the last decade has allowed for the identification of multiple tumor molecular markers, also known as signature molecules, • EGFR, HER-2 and ALK are all receptor tyrosine kinases, located on the cell surface and activated through the binding of ligands, mostly growth factors. This leads to the activation of a whole range of downstream signaling cascades and results in cell survival, proliferation and migration [5,6]. • KRAS and BRAF are the genes responsible for making the proteins K-ras and B-raf, which are, amongst others, involved in important signaling pathways (e.g., Ras-Raf-MAPK, PI3-K-AKT) [7,8]. Mutation and down-/up-regulation of any of those kinases can lead to malignancy and especially cancer formation. • VEGF is a signaling factor promoting the formation of new blood vessels. To grow and metastasize, solid cancers require blood supply, which they attain by expressing VEGF to form supporting vasculature [9]. • TP-53 is involved in the regulation and progression through the cell cycle; monitors genomic stability and can induce apoptosis. It is one of the most prominent tumorsuppressors [10]. • PD-L1 is involved in suppressing the adaptive arm of the immune system. By upregulating PD-L1 expression, cancer cells may evade the host immune system [11]. • IDH catalyzes the decarboxylation of isocitrate. Through this metabolic deregulation, cancer progression can be initiated or supported [12].
• Ki-67 is a protein that is present during all active phases of the cell cycle but absent in resting (quiescent) cells [13]. Therefore, this cellular proliferation marker is frequently used to distinguish fast growing cell populations, such has cancer cells, from normal cells.
Throughout this review, the term "biomarker" refers, for the sake of simplicity, to any of the above-mentioned biological endpoints. This is also in accordance with the World Health Organization (WHO), which defines biomarker as "any substance, structure, or process that can be measured in the body, or its products and influences or predicts the incidence of outcome or disease" [14].

Materials and Methods
The analysis was conducted according to the PRISMA-P Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement [15]. The protocol for this systematic review was registered at PROSPERO (CRD42020207220) and is available at https://www. crd.york.ac.uk/prospero/display_record.php?ID=CRD42020207220 (accessed on 11 June 2021). No amendments were performed with respect to the published protocol.

Literature Search
The search was conducted in PubMed database. According to PRISMA guidelines, article selection was carried out via multiple steps. The literature search was performed using the query "Radiomics [All Fields] AND keyword [All Fields]", where keyword corresponded to one of the ten molecular markers under study (i.e., ALK, BRAF, EGFR, HER-2, IDH, Ki-67, KRAS, TP-53, PD-L1 and VEGF) and the possible variations in its naming (e.g., HER2 and HER-2). The full list of queries is provided in the Supplementary Materials (List S1). In total, twenty independent searches were performed. No records were included from other sources such as direct correspondence with authors. The search had no start date limit and was concluded on 31 March 2020.
For each independent search, all retrieved studies were collected, and duplicates were posteriorly removed using the open-source reference management software Zotero [16].

Eligibility Criteria
During the first screening phase, those studies which did not fulfil the following requirements were excluded: (1) the article had to be written in English, (2) the study had to be a scientific article excluding reviews, (3) the topic had to be related to biomarkers in cancer. Following this step, every article was assigned to one of the following categories, depending on the cancer site: breast, central nervous system (CNS), gastrointestinal, liver, lung and others.
The full-text articles were then assessed for eligibility. An article was excluded from the final analysis if at least one of the following criteria applied: (1) only one of the two groups, biomarker-negative or biomarker-positive, patients were included in the study, (2) the total number of patients was less than 40, (3) the association between the biomarker and radiomics features was not investigated, (4) the biomarker analyzed was not among the ten biomarkers defined in the search and (5) less than 20 image features were investigated.

Analysis
Those articles that satisfied the screening and eligibility criteria were included in the following analysis, with each tumor site corresponding to a dedicated subsection in this review. First, the distribution of the number of patients included within all the studies was evaluated. The frequency of investigation of a given biomarker for each tumor site was collected in a dedicated table, together with the total number of studies on each tumor site and on each biomarker.
For each study, we gathered the following information when available: the studied biological endpoint and its alteration, e.g., mutation on a specific exon, over-expression, etc.; the imaging modality; the origin of the dataset; the training set size; the validation set size and type of validation, i.e., internal, temporally independent, external, leave-one-out-, 3-, 5-and 10-fold cross-validation (LOOCV, 3-CV, 5-CV, 10-CV) or bootstrap methods; the initial number of studied radiomic features; the application of feature reduction and feature robustness analysis methods; the reported performance, i.e., the area under the receiver-operating characteristic curve (AUC), classification accuracy or c-index; the public availability of the code and/or data; the reported quality score of the radiomics study, e.g., the transparent reporting of a multivariable prediction modelling for individual prognosis or diagnosis (TRIPOD) score [17] or Radiomics Quality Score (RQS) [4].
Furthermore, radiomic features of the best performing models on the training set were identified for each combination of tumor site, biomarker and image modality, in order to provide, when possible, a visual interpretation of the findings. For consistency, performance on the training set was evaluated since external validation was only performed on a small fraction of the studies. Moreover, in this comparison, the selection was limited to models based solely on radiomic features, i.e., mixed models including clinical-radiological data were excluded. This process was done independently by each of the authors in the systematic review. If the study provided a visual interpretation of such features, it was recorded. Otherwise, whenever possible, the missing interpretation was provided by the authors.
In accordance with PRISMA guidelines, a strategy for bias risk minimization was adopted as follows: the processes of screening, eligibility evaluation and extraction of data for the meta-analysis were performed independently by authors ALG, DV, FT, RDB and VW. Each author analyzed one specific tumor site. The more experienced authors JEvT, ST-L, MG and MP supervised the process and guaranteed a uniform and unbiased analysis throughout the different tumor sites. A detailed description of this process can be found on the PRISMA checklist in the Supplementary Materials (List S2).

Literature Search, Eligibility Criteria and Study Selection
A diagram summarizing the study selection workflow following PRISMA guidelines is shown in Figure 1. A total of 304 records were first retrieved from PubMed. After duplicate removal, 183 articles were left for screening. The first screening excluded 33 articles, leaving 150 full-text studies for the eligibility assessment. After further evaluation, 46 references were excluded because they did not meet the conditions previously defined. As a result, 104 articles were included in the current review.
The size of the dataset under study varied significantly among the reported papers (43-1010 patients). As above-mentioned, studies including less than 40 patients in total were excluded from the analysis during the screening phase. The mean number of patients included was 198. The distribution is shown in Figure 2.
The best predictive performance was achieved by Li et al. on MR images of glioblastoma patients by means of a random forests (RF) model based on gray-level co-occurrence matrix (GLCM), grey-level run-length matrix (GLRLM) and grey-level size zone matrix (GLSZM) textural features together with patient age (AUC = 0.96, external validation) [19]. Among MR radiomics in both, lower-and higher-grade gliomas, relevant features for IDH mutation status prediction were associated with textural heterogeneity, suggesting that IDH wild-type tumors are more heterogeneous and more structurally complex than IDH-mutant ones [19,20,26]. Another feature that was significantly linked with IDH mutation status was tumor mean surface-to-volume ratio, which was lower in IDH-mutant cases [20,23]. Moreover, IDH-mutant gliomas were found to occur more frequently in the frontal, insular and temporal lobes [24,26].
A total of seven studies were found which combined MR radiomics with diffusion weighted imaging (DWI), perfusion weighted imaging (PWI) and/or diffusion kinetic imaging (DKI) features to predict IDH mutation status in glioma patients (AUC = 0.747-0.931) [42,43,[45][46][47][48][49]. Among these studies, three of them incorporated clinical-radiological parameters in their modelling [42,43,45] and one employed additional VASARI imaging features [49]. The best performance on an external cohort was achieved by Lu et al., who proposed two support vector machine (SVM) models based on MR and DWI features together with patient age and sex to predict IDH mutation status in GBM and LGG patients separately (accuracy = 88.9-91.7%, external validation) [43]. Similar to MRI radiomics, DWI and PWI textural and intensity features describing increased tumor heterogeneity were associated with IDH wild-type tumors. Moreover, IDH wild-type LGGs were found to have smaller minimum Apparent Diffusion Coefficient (ADC) and Cerebral Blood Volume (CBV) values, which could indicate an increased tumor proliferation index and increased malignancy [47,49].
Two studies were found which used PET radiomics in conjunction with clinicalradiological parameters to predict IDH status in gliomas. One of them used fluoroethyltyrosine (FET)-PET standard parameter slope and one texture feature (accuracy = 80.0%, 10-CV) [52], while the other one combined fluorodeoxyglucose (FDG)-PET radiomics with age and tumor metabolism to achieve an AUC of 0.90 on an internal validation set [50]. Among FDG-PET radiomics, the feature sphericity was found to play a significant role in IDH mutation status prediction, indicating that IDH-mutant gliomas are less spherical than IDH wild-type in FDG-PET scans. Lastly, one study used APTw radiomics to predict IDH status in LGG patients (AUC = 0.84, internal validation) [53]. GLCM and GLRLM radiomic features describing tumor heterogeneity were identified as main contributors of IDH genotype prediction, with IDH-mutant tumors being more homogeneous.

EGFR
In total, two studies were found which investigated the relationship between MR radiomics and EGFR alterations in glioma, more precisely, between EGFR over-expression in LGG patients and EGFR mutation in GBM patients [31,36]. The former proposed a logistic regression model based on 41 radiomics features (AUC = 0.95, internal validation) [36], and the latter study employed a symbolic regression method based on non-conventional MR spatial diversity descriptors (AUC = 0.845, cross-validation) [31]. In both cases, MR features describing tumor textural heterogeneity and shape irregularity were linked to EGFR, suggesting increased diversity in EGFR-mutated and EGFR-amplified tumors.
Three studies were found which employed MR radiomics together with DWI and PWI radiomics to predict EGFR mutation status in GBM patients. Binder et al. studied a variety of EGFR missense mutations and concluded that EGFR mutation at alanine 289 (EGFR-A289D/T/V) presented a unique radiographic phenotype. Authors reported significantly lower average T1 signal, higher relative cerebral blood volume and longer major axis in EGFR-A289D/T/V-mutant tumors among other features [39]. The two remaining studies investigated the prediction of EGFR mutation at exons 2-7 (EGFRvIII) and incorporated additional imaging features on their modelling such as location parameters, tumor growth model parameters and the peritumoral heterogeneity index. Authors reported predictive accuracies of 73.58% [40] and 87% [41] on a temporally independent and on an internal validation cohort, respectively. Authors of the three above-mentioned studies suggested that EGFR-mutant tumors present an increase in shape variability and water concentration as well as a decreased cell density.

TP-53
Two studies evaluated the power of MR radiomics for TP-53 mutation status prediction in LGGs with a varying performance (AUC = 0.763-0.869, internal validation) [30,36]. One of them also included VASARI imaging features in the modelling. Authors concluded that TP-53 mutant gliomas are more heterogeneous and present higher water content.

VEGF
One study was found which investigated the use of conventional MR radiomics to predict VEGF expression in LGGs (AUC = 0.702, internal validation) [37].
Among PWI radiomics analysis, the highest predictive performance was achieved by Fan et al. by means of a logistic regression model based on 15 features from dynamic contrast enhanced MRI (DCE-MRI) images (AUC = 0.95, internal validation) [59]. Conventional MR features were also shown to associate with HER-2 status in [64] (accuracy of 73.6%, training cohort). In the study using PET/CT, only mean standardized uptake value (SUV mean ) and total lesion glycolysis (TLG) were independently associated with HER-2 status (p = 0.021 and p = 0.046, respectively) [68]. DMG radiomic features were employed to predict HER-2 status in [69]. Authors reported higher prediction performance when employing a combination of bilateral craniocaudal and mediolateral oblique view images derived from 2D MG (AUC = 0.787, internal validation), compared to radiomic features from each view alone.

Ki-67
Three studies investigated the relationship between Ki-67 expression levels and PWI radiomics (AUC = 0.74-0.81) [56,57,62], with two of them employing pharmacokinetic radiomic features. Authors in two of the three studies suggested that high Ki-67 expressing tumors are associated with higher intra-tumoral heterogeneity. On the other hand, DWI radiomics were employed in [65], achieving a final AUC of 0.72 on an internal validation cohort. Another study combined both, PWI and DWI radiomics, to predict Ki-67 status and reported a final AUC of 0.81 after cross-validation by means of a multi-task learning model which was also trained to predict tumor grade [66].
The study using PET/CT could not find any radiomic features that were significantly associated with Ki-67 expression level [68]. Another study used DBT images and showed that a combination of the five most predictive features yielded an AUC of 0.698 for lowversus high Ki-67 expression [67]. Liang et al. compared the utility of T1-weighted with contrast (T1 + C) to T2-weighted (T2w) radiomics to predict Ki-67 [63]. The analysis revealed that the T2w image-based radiomics classifier could significantly associate to Ki-67 expression on an external cohort (AUC: 0.740 (95% CI: 0.645-0.836), whereas T1 + C-based radiomics failed for the same dataset.
Among the eight studies exploring an association of Ki-67 expression levels with breast cancer radiomics, four studies employed a cut-off threshold of 14% [56,63,65,67], two studies used a threshold of 20% [62,68] and the other two did not specify any cut-off value.

TP-53
The strongest association for breast cancer was found in PWI radiomics of 88 patients in which 13 radiomic features predicted TP-53 alterations with an AUC of 0.886 (95% CI: 0.817-0.955) after cross-validation [55].  [90,91], and TP-53 (n = 1) [84]. All studies showed a significant relationship between EGFR and radiomic features (AUC = 0. 66-0.95). Two studies that could not find an association with KRAS status, but all remaining studies found some linkage between radiomics and the respective biomarker (AUC = 0.66-0.99). In total, 21 studies validated their predictive models, two of which were external validation. None of the studies reported any radiomics quality measure, and only one of them was a registered prospective study [83]. A summary of the findings of this section can be found in Table 4.

ALK
For ALK mutation status, CT radiomics showed an AUC of 0.80 on a temporally independent validation cohort [87]. Selected radiomic features inferred that ALK mutated tumors were associated with denser tumors. One study observed that PET-based radiomics combined with tumor stage and age was able to differentiate ALK/ROS1/RET fusionpositive and fusion negative tumors (sensitivity = 0.73, specificity = 0.70) [98]. A study on 110 patients evaluated if MR radiomics from brain metastasis originated from lung cancer was shown to associate with EGFR, ALK and KRAS mutations and reported excellent model performances for all three tissue biomarkers (AUC > 0.9, LOOCV) [102].

PD-L1
PD-L1 expression levels were observed to associate with CT radiomic features in two studies (AUC = 0.661 [89] and AUC = 0.848 [88], internal validations), indicating that dense and homogeneous tumors (without ground-glass opacity, necrosis, cavitation or calcification) were more likely PD-L1 positive in lung adenocarcinoma [89]. Radiomics from PET/CT imaging was found to be similarly strongly predictive as CT but outperformed PET in PD-L1 expression level prediction for 399 stage I-IV non-small cell lung cancer (NSCLC) patients (AUC > 0.8, internal validation) [99].

TP-53
The association between CT radiomic features and TP-53 mutation was studied in [84]. Authors reported a final AUC of 0.604 on an internal validation cohort.

KRAS
The association of KRAS mutations with radiomic signatures was the most frequently assessed in gastrointestinal cancer. The strongest relationship was found in CE CT of CRC patients, where the mutation signature KRAS/BRAF/NRAS was significantly associated with three GLCM features (energy, maximum probability and sum average), achieving a final AUC of 0.829 on an internal validation cohort [106].
One group focused on the association of KRAS mutation to FDG-PET radiomics of pancreatic ductal adenocarcinoma patients [108], concluding that low-intensity textural features were significantly associated with KRAS gene mutational status (AUC = 0.794-0.82, training). Authors suggested that KRAS-mutated genes were associated with higher intratumoral heterogeneity levels. The relationship between FDG-PET radiomics and KRAS mutation was also studied for CRC patients in [109]. KRAS-mutated tumors presented an increased value at the 25th percentile of maximal SUV (SUV max ) of the metabolic tumor volume (MTV) as well as for the GLCM-derived contrast (AUC = 0.73-0.79, training).
Another study evaluated the association between KRAS mutation and CT imaging features, including hand-crafted and deep learning radiomics, of CRC patients [107]. The combined model achieved the highest performance (c-index = 0.831 (95% CI, 0.762-0.905), external validation), when compared to radiomics-alone and deep learning radiomicsalone models.
Two studies evaluated the association between T2w MR radiomics and KRAS mutational status in rectal cancer. In the first one, authors reported a final AUC of 0.884 on the training cohort by means of a decision tree based on three textural features [111]. In the second study, seven features were shown to associate to KRAS mutation status [110]. The best prediction model was obtained with SVM classifiers (AUC = 0.714 (95% CI, 0.602-0.827), external validation). Moreover, wavelet features derived from MR, PWI and DWI were associated with KRAS mutation in rectal cancer patients in [112], achieving a final AUC of 0.651 (95% CI, 0.539-0.763) on a temporally independent validation cohort.

TP-53
One group found that an increased value of short-run low gray-level emphasis derived from the GLRLM in FDG-PET/CT was predictive for TP-53 mutation in CRC patients (AUC = 0.71, training). Authors also reported higher heterogeneity and lower PET signal values in TP-53-mutant cases [109]. On the other hand, one study carried out with FDG-PET/CT data from pancreatic ductal adenocarcinoma patients did not see a significant association between genetic alterations in TP-53 status and the radiomic features extracted from the PET images [108].

HER-2
The association of HER-2 status and CT radiomics in gastric cancer patients was investigated in [103]. Authors reported a final AUC of 0.771 (95% CI, 0.607-0.934) on an internal validation cohort when employing a nomogram based on seven wavelet features and patient carcino-embryogenic antigen (CEA) level. One study extracted radiomic features from pre-operative MR images of patients suffering from rectal cancer, achieving a final AUC of 0.696 (95% CI, 0.610-0.782) on a temporally independent validation cohort [112].

Ki-67
Three studies investigated the potential association of Ki-67 index and radiomic signatures [104,105,112]. A CE CT-based radiomics nomogram including six radiomic features for the gastrointestinal stromal tumors was significantly associated with Ki-67 (AUC = 0.754, external validation) [104]. Another retrospective, multicenter study in CE CT focused on pancreatic neuroendocrine tumors showed a significant association between Ki-67 and an eight-feature-combined radiomics [105]. The third study analyzed a combination of MR, PWI and DWI radiomics to predict Ki-67 expression, with a final AUC of 0.699 on a temporally independent cohort [112]. Different Ki-67 expression cut-off values were used on each study, ranging from 10 to 40%.

BRAF
As explained in the KRAS biomarker subsection, one study investigated the relation between CE CT radiomics and the mutation signature KRAS/NRAS/BRAF together, which reported a final AUC of 0.829 on a temporally independent cohort [106].
3.6. Liver Cancer 3.6.1. Summary Four studies were found which associated radiomics and tissue biomarkers in liver cancer patients, using either MR with contrast agents (n = 2) or US (n = 2). The most common tumor type was hepatocellular carcinoma (HCC, n = 3) [113][114][115] followed by cholangiocarcinoma (CCA, n = 1) [116]. Three tissue biomarkers were investigated: Ki-67 (n = 3), PD-L1 (n = 2) and VEGF (n = 1) and all were shown to be significantly correlated to radiomics (AUC = 0.85-0.97). All studies employed a dataset limited to a single center; one study separated the dataset into a training and a validation cohort [116]. None of the studies reported any radiomics quality measure and only one of them was a registered prospective study [114]. A summary of the findings of this section can be found in Table 6.

PD-L1
The best predictive performance overall for liver studies was obtained for PD-L1 in US images of HCC patients (AUC = 0.97, cross-validation) [115]. The expression of PD-L1 was also predicted from MRI images of HCC, where the best association was found with the texture feature ADC variance. This may be interpreted as a correspondence between higher heterogeneity and higher PD-L1 expression levels [113].

Ki-67
The best AUC for Ki-67 expression prediction in HCC was obtained in [115] by means of a SVM model based on US radiomic features (AUC = 0.94, cross-validation). Slightly worse performances (AUC = 0.804, internal validation) were obtained with US wavelet features for CCA patients in [116]. Another group employed texture features from MR images of HCC patients [114]. Authors combined 13 features from T2W, pre-contrast (PRE), arterial phase (AP) and portal venous phase (PV) scans into a multiparametric texture signature which achieve a c-index of 0.878 after cross-validation. The features included suggested that higher intra-tumor heterogeneity correlates to higher expression of Ki-67. The latter may reflect the cell proliferation status and therefore tumor aggressiveness.

VEGF
The relationship between VEGF expression and US radiomic features was analyzed only in CCA patients [116]. Wavelet features were found to be the most relevant feature type to predict the biomarker expression (AUC = 0.864, internal validation). These were associated with the heterogeneity of the tumor volume by the authors.

Summary
In total, five studies were found which investigated the correlation between radiomics and molecular markers in other entities not included in the sections above: melanoma (n = 1) [117], thyroid cancer (n = 1) [118], head and neck cancer (n = 2) [119,120], adrenal gland carcinoma (n = 1) [121]. All studies showed a significant correlation between the biomarker and radiomics (AUC = 0.62-0.78). None of the studies used external validation. None of the studies reported any radiomics quality measure, nor were they registered prospective studies. A summary of the findings of this section can be found in Table 7.

Details
One study explored the use of FDG-PET/CT radiomics to predict BRAFv600 mutation status in melanoma patients achieving a final AUC of 0.62 after 10-CV [117]. Another study investigated the use of US radiomics to predict BRAFv600 mutation of thyroid cancer patients with a limited predictive performance on a temporally independent validation cohort (c-statistics = 0.629) [118]. Two studies explored the association of different biomarkers and imaging features in head and neck squamous cell carcinoma patients. One of them reported a moderate predictive power of CT radiomics for TP-53 mutation prediction (AUC = 0.641, 5-CV) [119], while the other study reported a limited linkage between PD-L1, VEGF, Ki-67 and EGFR expression and FDG-PET radiomics on their training cohort [120]. The latter also showed a positive correlation between PD-L1 and Ki-67 expression. The GLCM-derived feature of correlation was found to be a negative predictor of PD-L1 expression, while it was positively associated with VEGF expression. One study investigated the efficacy of CE CT radiomics to predict Ki-67 expression in adrenal gland carcinoma patients [121]. The authors reported final AUCs of 0.7-0.78 on the training cohort after using logistic regression models based on two shape features, suggesting that high Ki-67 expression is associated with flatter and more elongated tumors.

Feature Interpretation
In Tables 8-10, we gathered those radiomic features employed in the best performing models for each combination of biomarker and tumor site, for MRI, CT and PET, respectively. Detailed tables including feature names and additional modalities (e.g., US or advanced MRI sequences) are shown in Supplementary Tables S1-S4. For seven studies, no interpretation was possible due to lack of information. Ki-67 high expression more heterogeneous [34] TP-53+ higher intensity [36] VEGF+ more heterogeneous [37] IDH+ more homogeneous, more regularly shaped [21] GI KRAS+ more heterogeneous [110] Liver Ki-67 high expression more heterogeneous [114] Table 9. Interpretation of the best performing models on the training dataset for computer tomography (CT). Acronyms: head and neck cancer (HNC), gastrointestinal (GI), anaplastic lymphoma kinase (ALK), v-raf murine sarcoma viral oncogene homolog B1 (BRAF), epidermal growth factor (EGFR), human epidermal growth factor receptor 2 (HER-2), kirsten rat sarcoma viral oncogene homolog (KRAS), programmed cell death ligand 1 (PD-L1), tumor protein p53 (TP-53).

PET EGFR
Ki-67 high expression more elongated and flatter [121] Oftentimes, dysregulation of one specific biomarker led to similar tumor phenotype across entities and imaging modalities. This was the case for EGFR-mutant tumors, which exhibited greater textural heterogeneity in CNS MRI, PWI and DWI, as well as in lung CT and PET. Similarly, alteration of TP-53 status was associated with increased heterogeneity in CT of HN and PET of colorectal cancer. IDH-mutant tumors were reported to have greater textural homogeneity in MRI, DWI, PWI, DKI and FDG-PET in CNS. High Ki-67expressing tumors were reported to be more homogenous in CT for lung cancer but more heterogeneous for gynecological tumors and head and neck tumors. KRAS+ was shown to be more homogeneous for CT in lung, but more heterogeneous for gastrointestinal cancer.

Results per Biomarker
An overview of the analyzed studies per biomarker can be found in Tables S5-S14 in the Supplementary Materials.

Discussion
In recent decades, extensive genomic studies have leveraged our understanding of cancer biology and pathophysiology. The identification of key genetic alterations that drive oncogenesis and their subsequent molecular markers has led to a more accurate and comprehensive patient-specific treatment planning and adaptation [2]. Furthermore, the field of radiomics, i.e., the quantitative, high-throughput analysis of medical images, has emerged as a potential diagnostic, prognostic and predictive tool in clinical decisionsupport systems. This is of particular interest in cancer treatment, where medical imaging is routinely performed with diagnostic and monitoring purposes. Nonetheless, the reliability, clinical applicability and biological meaning of radiomics models and imaging biomarkers has to be extensively validated before they can be incorporated into clinical routine [17]. Hence, the primary objective of this review was to identify key radiomic features associated with specific tumor molecular markers through an electronic search of peer-reviewed journal publications.
For this purpose, we limited our search to ten cancer biological endpoints commonly investigated and used in clinical practice, which apply to a broad range of cancer types. Other, even though valid, biomarkers, such as methylation status or indicators for virusborn cancers were deliberately excluded as their origin and/or mechanism leading to malignant transformation of healthy cells is not trivially comparable. Other examples of biomarkers excluded in this review are the loss of tumor suppressors in cancer such as breast cancer genes 1 and 2 (BRCA-1, BRCA-2), RNAs, proteins such as prostate-specific antigen (PSA) or circulating tumor DNA (ct-DNA) [1]. By focusing on this compact set of biomarkers, we aimed to summarize the reported associations between radiomics and signature molecules and eventually contribute to the promotion of radiomics as a valid diagnostic, prognostic and predictive tool in cancer treatment. We are aware that the selection of biomarkers is not complete but due to the sheer number of biomarkers and the variability thereof, the search had to be narrowed in order to perform a meaningful systematic review.
Most of the studies included in this review reported some association between the selected biomarkers and radiomics, suggesting that mutated and non-mutated tumors have different growth patterns that are identifiable in high-throughput imaging. The association of textural, intensity, shape, size and wavelet image features with tumor biomarkers entails an advance in feature interpretability, as shown in Tables 8-10, which brings radiomics closer to its application in a clinical setting.
In total, 96 out of 104 studies found a significant relationship between at least one of the studied biomarkers and one or more radiomic features. However, only 7 studies validated their models on external cohorts, 11 studies on temporally independent cohorts and 14 studies did not use any form of validation. Additionally, only 7 out of 104 included a prospectively collected dataset, which is necessary to confirm the clinical validity and usefulness of any radiomics signature. Along these lines, we believe greater effort should be made to employ larger, multi-institutional cohorts, either by means of new data-sharing agreements among research groups or through distributed learning. The feasibility of the latter has already been shown in a number of studies and entails new possibilities for training reliable radiomics models [122,123]. Furthermore, only 37 studies performed some type of robustness analysis of the selected features. Different image acquisition parameters, scanner models, pre-processing and region of interest segmentation techniques among other factors have been shown to significantly affect feature robustness and results reproducibility, and should be evaluated in greater detail [124][125][126]. Moreover, we would like to encourage projects such as the image biomarker standardization initiative (IBSI) [127], which works towards the homogenization of image feature extraction and analysis.
Another factor that hindered results interpretation and studies comparison was the great variability in biomarker expression levels employed as cut-offs to stratify patients. Currently, there exist a lack of standardization of immunohistochemistry techniques for biomarker staining and scoring systems, leading to moderate intra/inter-laboratory and intra/inter-observer variabilities [1,128]. This could potentially explain the observed phenotype disagreement across different entities and modalities for Ki-67, PD-L1 and KRAS biomarkers, as described in Tables 8-10. However, as previously explained, it should be noted that these studies were included on the interpretation table based on their performance on the training set, and, for the vast majority, external validation remains to be accomplished.
In an attempt to standardize the clinical utility evaluation of radiomics studies, as well as to increase transparency and minimize risk of bias, two rigorous reporting guidelines, the TRIPOD [17] and the RQS [4] scores, have been devised. In Tables 2-7, we gathered some of the most relevant reporting criteria such as the type of validation used, the performance of feature reduction and robustness analysis, the use of discrimination statistics, the inclusion of non-radiomic features and the public availability of the code and/or data. However, none of the studies included in this review followed explicitly TRIPOD or RQS guidelines. We would like to encourage the use of such guidelines as they provide a common framework to compare state-of-the-art results in radiomics and bring closer its incorporation into clinical decision support-systems.

Conclusions
In summary, radiomics from different modalities and cancer entities is a promising tool for tumor biology assessment. Nevertheless, a large majority of studies included in this review only employed internal validation datasets or bootstrap and cross-validation techniques to assess model performance. Thus, further multi-center, prospective studies are required to validate the reported outcomes. Moreover, none of the studies followed any reporting or quality assurance protocols. Hence, we would like to encourage the employment of reporting guidelines such as TRIPOD and RQS scores, as well as the use of IBSI-standardized radiomics software. As a closing remark, we would like to emphasize the utmost importance of transparency to ensure the reproducibility of radiomics studies.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/cancers13123015/s1, List S1: Queries employed in the PubMed search, List S2. PRISMA Checklist, Table S1: Feature interpretation for ALK, BRAF and EGFR, Table S2: Feature interpretation for HER-2, IDH and Ki-67, Table S3: Feature interpretation for KRAS, KRAS/BRAF and PD-L1, Table S4: Feature interpretation for TP-53 and VEGF, Table S5: An overview of the radiomic studies included for IDH biomarker, Table S6: An overview of the radiomic studies included for EGFR biomarker, Table S7: An overview of the radiomic studies included for VEGF biomarker, Table S8: An overview of the radiomic studies included for HER-2 biomarker, Table S9: An overview of the radiomic studies included for ALK biomarker, Table S10: An overview of the radiomic studies included for BRAF biomarker, Table S11: An overview of the radiomic studies included for PD-L1 biomarker, Table S12: An overview of the radiomic studies included for TP-53 biomarker, Table S13: An overview of the radiomic studies included for KRAS biomarker and Table S14: An overview of the radiomic studies included for Ki-67 biomarker.

Author Contributions:
The processes of screening, eligibility evaluation and extraction of data for the meta-analysis were performed independently by the authors A.L.G.S.-E., D.V., F.T., R.D.B. and V.W. The authors J.E.v.T., S.T.-L., M.P. and M.G. supervised the process. All authors contributed to writing and reviewing the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This work was partially supported by the Swiss National Science Foundation (310030_173303, 310030_172885 and CRSII5_183478), the Klinischer Forschungsschwerpunkt (KFSP) Artificial Intelligence in Oncological Imaging from the University of Zurich and the Swiss Personalized Health Network (SPHN) IMAGINE.