Role of Artificial Intelligence in Radiogenomics for Cancers in the Era of Precision Medicine

Simple Summary Recently, radiogenomics has played a significant role and offered a new understanding of cancer’s biology and behavior in response to standard therapy. It also provides a more precise prognosis, investigation, and analysis of the patient’s cancer. Over the years, Artificial Intelligence (AI) has provided a significant strength in radiogenomics. In this paper, we offer computational and oncological prospects of the role of AI in radiogenomics, as well as its offers, achievements, opportunities, and limitations in the current clinical practices. Abstract Radiogenomics, a combination of “Radiomics” and “Genomics,” using Artificial Intelligence (AI) has recently emerged as the state-of-the-art science in precision medicine, especially in oncology care. Radiogenomics syndicates large-scale quantifiable data extracted from radiological medical images enveloped with personalized genomic phenotypes. It fabricates a prediction model through various AI methods to stratify the risk of patients, monitor therapeutic approaches, and assess clinical outcomes. It has recently shown tremendous achievements in prognosis, treatment planning, survival prediction, heterogeneity analysis, reoccurrence, and progression-free survival for human cancer study. Although AI has shown immense performance in oncology care in various clinical aspects, it has several challenges and limitations. The proposed review provides an overview of radiogenomics with the viewpoints on the role of AI in terms of its promises for computational as well as oncological aspects and offers achievements and opportunities in the era of precision medicine. The review also presents various recommendations to diminish these obstacles.


Introduction
Cancer is a second leading cause of death worldwide, right after cardiovascular diseases, accounting for nearly 10 million deaths in 2020. As per world health organization (WHO) statistics, the common types of cancers that people suffer more are those of the breast, lungs, colorectal, prostate, skin, brain, and stomach [1]. The cancer burden continues to grow globally, exerting tremendous physical, emotional, and financial strain on individuals, families, communities, and health systems. Countries with mediocre and poor health infrastructure do not have the access to timely, quality diagnosis and treatment for a large number of patients [2].
In the era of precision medicine, molecular characterization of cancer using genomic technology is essential [3,4]. In the last few years, significant progress has been observed in molecular characterization. However, due to the technical complexity, cost, and turnaround time, a vast scale genome-based characterization of cancer is not yet routinely adapted for all types of cancers [5][6][7]. In existing clinical practices, due to the heterogeneous behavior of cancer, molecular profiling is often limited, and heterogeneity of the tumor is repeatedly missed when a portion of the cancer is examined [8]. Throughout the treatment, determination of molecular targets requires ex vivo postoperative analysis of the resected tumor or biopsy sample. This has restricted the assessment of tumors' spatial and temporal heterogeneity and is not possible to determine the molecular transformation of cancer continuously [9]. Additionally, in the case of the solid type of tumors, the functional, anatomic, and physiological properties of the whole tumor may not be fully reflected in the histopathological samples [10,11]. Researchers and scientists worldwide have noticed the substantial job of medical imaging in clinical treatment decision-making and in analyzing cancers [12]. Earlier, its main job was restricted to prognosis and staging [13]. However, recently, imaging-derived markers obtained from clinical images have significantly been investigated to deliver insight into cancer non-invasively. Most importantly, imaging helps characterize the peritumoral regions, as these regions are not always invasively removed for the molecular characterization of cancer [14,15].
Recently, radiogenomics, the combination of "Radiomics" and "Genomics," has significantly drawn the attention of researchers to determine imaging surrogates for genomic signatures and to advance biomarkers leveraging the numerous data types used to characterize cancer. These biomarkers can be used in different clinical decision-making such as survival prediction, tumor progression, reoccurrence, and heterogeneity analysis.
In "Radiomics," different quantitative medical imaging features are extracted computationally that capture different imaging phenotypes; these are not easily noticed with the uncovered eye [16]. Recent research demonstrated that cancer's molecular information is linked with the imaging phenotype [17]. The basic, fundamental step in radiomics includes image data acquisition, preprocessing of image data such as filtering [18], region of interest (ROI), segmentation [19,20], different types of features extraction [21,22], and then use of these extracted features for appropriate analysis. ROI extraction in the imaging data must be performed manually or semi/fully automatically using computational algorithms approved by the expert neuropathologist (neurooncologist). Quantitative features extraction includes different features such as histogram-based, first, second, or higher-order features [23][24][25][26]. Recently, high-level features obtained from deep learning have also been significantly used to analyze cancer regions [27][28][29]. In "Genomics", the human genome is examined to analyze cancer by extracting several genomic features (genotypes). The genotype basically presents the genetic information of cancer.
Further, these obtained radiomics and genomics features are used by different AIbased methods to characterize and analyze cancer. In recent years, AI has presented data-driven examination models that have managed the noteworthy signs of progress in information-processing methods in radiogenomics of cancer. There have been constant and incremental determinations to improve AI's analytic efficiency to be endorsed for clinical practice [30,31]. The discovery of artificial neural networks (ANN) and their subsequent development [32,33] presents computational learning models: machine-and deep-learning ideologies. It is mainly accountable for the development of AI in the field of radiogenomics.
In the proposed article, our main objective is to provide and discuss different perspectives regarding the contemporary and inherent responsibility of AI methods in radiogenomics of cancer, including current challenges and prospects. We will start this article by providing an insight into radiogenomics and its achievements. Further, we will discuss opportunities provided by AI and how it is significantly used in different recent studies of cancers by providing an analytical form of investigation. In the end, we will conclude with the overall perspective of using AI in radiogenomics of cancer, applied in clinical decision-making in the epoch of individualized medicine and care.

Search Strategy and Statistics of Radiogenomics Studies
This section deals with the search strategy using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) model, followed by statistical distribution and analysis of various radiogenomics studies.

The PRISMA Model
The proposed narrative review has basically been designed to analyze the role and impact of AI in radiogenomics study. The PRISMA model has been adapted for this purpose, as shown in Figure 1. A detailed search was performed using top academic search databases such as Google Scholar, PubMed, IEEE Xplore, ScienceDirect, Springer, and MDPI. The relevant keywords used for the search included "radiogenomics", "radiomics", "genomics", radiogenomics using AI", "machine learning for radiogenomics", and "deep learning for radiogenomics". A total of 154 records were collected. The duplicate records were removed from this collection using the "Find Duplicates" feature in EndNote software by Clarivate Analytics, which resulted in 104 articles. The three exclusion criteria (marked as E1, E2, and E3 in Figure 1) removed 23, 20, and 10 articles under the category of (i) non-relevant articles, (ii) studies not related to AI, and (iii) articles with insufficient data. Finally, 51 relevant articles were used for the qualitative synthesis to know the impact and role of AI in radiogenomics studies.

Statistical Distributions of AI Attributes of Radiogenomics Studies 2.2.1. Statistical Distribution of Publication Trends of Radiogenomics Using AI
Since radiogenomics is a new domain of study in cancer research, the number of publications in the initial stages is low; however, it has continued to grow over the past few years. It has been emerging for the past five years, as shown in Figure 2a. However, the number of publications in the form of search is low, and it is expected to increase in the recent future as it is applicable to all kinds of cancer research, providing an extra edge over the other methodologies.

Statistical Distribution of Country-Wise Study of Radiogenomics Using AI
As this is a current and trending topic in the research of the deadliest cancerous disease, many research publications have been published across the globe. As it is an emerging domain of research in oncology, there is evident curiosity as to the leading contributors in terms of country. Figure 2b depicts the pie-chart distribution of county-wise research publications on the set of radiogenomics studies we have considered. It shows that the USA and China are the leading contributors, with maximum percentages of 39% and 31%, respectively.

Statistical Distribution of AI and Its Model Used in Radiogenomics Studies
Artificial Intelligence has been successfully serving every domain of computer vision applications in the healthcare industry. AI has also helped radiogenomics studies to become automotive. Both machine learning and deep learning under the umbrella of AI take radiogenomics studies to a further level, with precision in performances. Under ML, traditional radiomics features are extracted, while under DL models, the automatic deep features help the AI model better classify the status of genomics in radiology. Our finding indicates that ML has been used a bit more compared to DL, which is shown in Figure 3a. Similarly, the frequency of various ML and DL models that have been used for this proposal, including convolutional neural network (CNN), regression, random forest (RF), support vector machine (SVM), ResNet, XGBoost, VGG, naïve Bayes, artificial neural network, DenseNet, GoogleNet, K-NN, decision tree, and linear discriminant analysis (LDA), is shown in Figure 3b.

Statistical Distribution of Image Modalities Used in Radiogenomics
MRI, CT, and PET are the prominent imaging modalities considered for radiogenomics studies, as shown in Figure 4a. Their distributions are also shown in the pie chart, with their corresponding share in percentages. MRI has a greater share of 45%, which indicates that this modality can be used for all types of anatomical cancer; however, MRI is preferable in brain-tissue characterization. Apart from MRI, CT imaging has also been used largely with a share of 39% for the radiogenomics studies under this review, while others are combinations of MRI, CT, and PET. Additionally, mammography is used as an important image modality of breast cancer.

Statistical Distribution of Anatomical Area of Cancer in Radiogenomics
The various cancer types, according to different anatomical areas considered for this radiogenomics review, have depicted in Figure 4b. Among these cancer types, brain, breast, and lung cancer have been found to be more frequently analyzed for radiogenomics compatibility, with 23%, 14%, and 15%, respectively. However, this field of radiogenomics applies to all types of cancer that developed from the mutation of genes. The other prominent cancer types considered here for radiogenomics include liver, ovarian, collateral, gastric, prostate, kidney, head and neck, and skeletal muscle, as shown in Figure 4b.

Statistical Distribution of Dataset Used in Radiogenomics
The dataset size considered for the radiogenomics studies under this review includes the number of patients considered for the corresponding study. This includes all the objectives and the modality used in the study. As radiogenomics is a relatively new field of research, the dataset is not easily available for public use, even if not volumetric. It is observed that all the studies considered have datasets within nearly 1000 objects, and a few studies have also been limited to below 100 objects, as depicted in Figure 5. A higher data size is expected for better evaluation of a radiogenomics AI system to avoid data imbalance and over-fitting conditions.

Performance Analysis of Radiogenomics Studies
Performance evaluation has been the final and essential part of an AI-based diagnosis system. The higher the values of performance evaluation parameters, the better the AI system. Most of the radiogenomics studies have considered accuracy and area under the receiver-operating characteristic curve (AUC) as suitable performance evaluation parameters. However, sensitivity, specificity, precision, and other methods of statistical evaluation have been partly used. The mean and standard deviation (SD) of the radiogenomics studies were found to have an accuracy, in percentage, (84.34 ± 9.37) and AUC (expressed in percentage, 85.42 ± 7.95), respectively, as shown in Figure 6.

An Insight of Radiogenomics
The following subsections describe the components of radiogenomics. The entire pipeline is also presented, depicting different modules in radiogenomics.

Conventional and Deep Radiomics
Radiomics deals with the mining and extraction of quantitative medical imaging features that are helpful in clinical assessment methods to expand the prognostic, diagnostic, and predictive precision of disease. Radiomics technology can be applied to medical imaging of any disease; however, it is gaining importance in cancer research for personalized treatments. It has been applied quite successfully for all anatomical areas of cancerous images of multiple modalities such as MRI, CT, PET, US, etc. Each modality has its own peculiarities, while considering the tissue-level radiography of various anatomical sections. The conventional radiomics models primarily depend on explicitly hand-crafted features from radiological images [34,35]. These wide ranges of radiomics features can be texture, geometric, intensity, shape, histogram, dynamics curve, angiogenesis, metabolic, morphological, spatial, and statistical features, and even some high-dimensional features too [36][37][38][39]. Each feature has special importance for defining the imaging phenotype and revealing key components of the tumor phenotype. The most prominent texture features define the pattern and spatial arrangement of colors or intensities of the tumor. The geometric features describe the 3D shape, size, location, and dynamics curve characteristics of the tumorous image. The intensity features demonstrate the pixel or voxel intensities within the tumor image.
However, in recent years, the development of deep-learning technologies in computer vision has attracted the application of radiomics. The automatic feature-extraction of deep radiomics helps find relevant and useful features to extract on a large scale. The deep radiomics features depend upon the network's depth, with stacks of convolutional and fully connected deep layers [40]. The automatic process of deep radiomics also includes the auto feature-selection process, which may not be available with traditional radiomics. Traditional and deep radiomics find the appropriate phenotype information of cancer to classify the tumor diagnosis, prognosis, and personalized cancer treatments. Figure 7 shows the differences in extracting phenotype information by traditional radiomics and deep-learning methods. The next subsection describes the role and significance of genomics study in cancer research.

Significance of Genomics Study in Cancer Research
Genomic study of cancer is a relatively new area that considers the benefits of the recent advances in technology to examine the human genome, which comprises the entire set of DNA. By comparing DNA and RNA sequencing of cancer cells with the normal tissue, scientists and researchers identify genetic differences which could be the root cause of cancer [41].
Basically, cancer is caused by the unbounded germination of the cancerous cell [42]. DNA is the central control system of the cell with lots of genetic features (genotype) and defines the cell's behavior. The uncontrolled growth of cells may include DNA mutations, deletions, rearrangements, amplification, and the addition or removal of the chemical mark. The genotype is basically the investigation of the genetic constitution of an individual organism. Some prominent genotype of the human body includes Isocitrate dehydrogenase (IDH), Tumor Protien53 (TP53), epidermal growth-factor receptor (EGFR), O6-methylguanine-DNA methyltransferase (MGMT), etc. [43]. This genomics has a wideranging functionality in the human body for complete nourishment and growth. One example is TP53, a genomics protein that helps with DNA repair and cell growth. Due to some external factors, alteration of these genes can cause fatal cancer with severity, along with the effect of the mutations. Currently, next-generation sequencing (NGS) is an emerging applied science for ascertaining the chronology of RNA or DNA to study the genetic variation correlated with cancers or other biological phenomena. It enables rapid identification of common and rare genetic variation with genome sequencing, investigation, and identification. Table 1 shows some essential genotypes and their information, such as their function within the body and their alteration effect for different types of cancer.

Overall Flow of Radiogenomics
So far, we have discussed radiomics and genomics separately, and their functionality. The workflow (Figure 8) of the radiogenomics ("Radiomics" + "Genomics") study can be broadly partitioned into five different stages: image acquisition followed by image preprocessing, appropriate feature extraction, and dimensionality reduction (selection) such as PCA [38], the association of radiomics and genomics features, data analysis, and finally, the outcomes of radiogenomics. All prominent cancer types of various anatomical sections such as the brain, lungs, breast, kidney, liver, prostate, bladder, colorectal, gastric, pancreatic, ovarian, head and neck, and retinoblastoma can be studied through radiogenomics. Figure 8. Radiogenomics pipeline of 5 stages including data acquisition (radiological imaging), preprocessing steps, features (low and high-end) extraction and selection, the association of radiomics and genomics, analysis, and finally, the radiogenomics outcome [8].

Different Stages of Radiogenomics:
(i) Data acquisition and preprocessing: Image acquisition in cancer patients is a tedious task due to the severity of the patient's condition [14]. However, there are several medical imaging modalities, such as CT Scan, MRI, PET, Ultrasound [44,45], etc., which can locate and visualize cancer [46,47]. Each modality has its own peculiarities while considering the tissue-level radiography of various anatomical sections. The corresponding genomic data of cancer patients are collected as part of the genomics study. The preprocessing steps are an integral part while handling medical images. Preprocessing steps basically involve bias field correction, normalization, pixel or voxel resampling, and image registration [48,49]. However, data handling such as class imbalance, data augmentation, randomization, and standardization is important for cancerous images' radiomics data [47]. The initial stage of the radiogenomics study needs the region of interest (ROI) of the radiomics data, where the exact radiomics features of cancer are available. ROI is, however, a crucial part because of the unclear margin, shape, size, and location of the tumor. The preprocessing, data handling, and segmentation of radiomics data provide better accuracy on the AI model for better diagnosis and prognosis of cancer [8].
(ii) Feature extraction and selection: The radiomics features in a clinical context include the essential geometric features such as the shape and size of cancer; texture features such as first-order, second-order, and higher-order texture features; intensity features of pixel and voxel values; and statistical features such as histogram analysis and wavelet features [50]. There are two prominent categories of feature-extraction process for phenotype information of cancer radiography, namely hand-crafted feature and deep features, as shown in Figure 7. Feature selection or dimensionality reduction are crucial steps for radiomics data as they lead to high dimension, which subsequently lowers the performance of the AI model. Like the phenotype information, radiogenomics study also combines the genotype information to be extracted corresponding to the phenotype information of each cancer patient. The various genotypes whose alterations can cause cancer are IDH, TP53, MGMT, EGFR, PTEN, HER2, and Ki-67, as shown in Table 1.
(iii) Association of radiomics and genomics: In this step, both the radiomics features and genomics features of the cancer patients are combined to understand the tissue-level characterization of the cancerous regions or non-cancerous regions from the radiomics feature [51][52][53].
The deep model includes the deep neural network (DNN), convolutional neural network (CNN) [77], and deep temporal models such as the recurrent neural network (RNN) and long-short term memory (LSTM) model for temporal genetic data [54]. Similarly, various statistical tests, performance evaluation parameters, and performance analysis metrics have been involved in the data analysis of radiogenomics. The lesion localization analysis can be conducted by heatmap analysis for deep diagnosis [52].
(v) Radiogenomics outcome: This step of radiogenomics is the decision support system that includes various endpoint outcomes such as tumor grading, survival prediction of patients, generating imaging biomarkers, clinical decision, precision medicine, risk stratification, and personalized treatment-planning of cancer patients.

The Era of Radiogenomics in Precision Medicine
The popularity of precision medicine has grown over the last few decades, especially in oncology care. Precision medicine involves optimizing medication according to an individual's phenotypic and genotypic characteristics and the nature of the disease, taking the 'one size fits one' approach [99,100]. This encompasses mathematical modeling and biology, including metabolomics, transcriptomics, proteomics, and genomics [99]. Precision medicine involves identifying specific treatment targets and developing means of checking the changes in these targets using non-invasive and reliable methods [99]. Artificial intelligence in '-omics' and imaging modalities have been utilized in this context to develop models that can predict changes in the targets' environment and monitor the therapeutic outcomes, apart from the available standard care [101].
The shift from the traditional 'one size fits all' to the 'one size fits one' route involves implementing advances across several cross-sectoral, interdisciplinary, and multidisciplinary fields [102]. These advances range from developing tools for big data analysis and research in individualized medicine to standardization of possession, repositioning, and sharing of patients' computerized health reports and the involvement of the patients themselves [103,104]. The key to the success of precision medicine is a computationally intensive task. To develop computationally effective means of merging radiomics, genomics, and clinical data for data mining [102], the use of radiogenomics research demonstrates its significant potential for developing non-invasive diagnostic and prognostic markers, especially in the field of oncology [102].
Over the past few years, the rise of radiogenomics in cancer medicine can be attributed to various factors [105]. First is the gap existing between molecular pathology and traditional radiology. A deeper understanding of tumor components has driven the development of tailored cancer therapeutics [106,107]. Second is the rising interest in incorporating artificial intelligence with oncology medicine. The application of ML algorithms to a large-scale imaging database has further driven this process [108]. The third is the growing understanding of the tremendous potential that imaging data hold. This is of particular importance in the case of oncology, where there are temporal and spatial limitations of tissue sampling [105].
As discussed, radiomics involves extracting quantifiable data available from clinical radiography and integrating these data with patient data to generate a searchable database. Radiogenomics is then utilized to provide complete information for a heterogeneous tumor or a metastatic disease, guiding the development of therapy suitable for the individual [99]. Large-scale databases are possible due to the enormous amount of imaging data. Mining important and significant radiomics information from these radiological databases is necessary to generate valuable information employing advanced techniques, frame-works, analytics, and algorithms [100]. However, the more significant challenge remains in maintaining clarity and consistency in performing such studies. Hence, developing a standardized workflow and internationally consented methods is necessary for effective and robust studies [100].
Consortiums have been developed to standardize radiogenomics studies [100]. The Transparent Reporting of a multivariable prediction model for individual prognosis or Diagnosis (TRIPOD) report is a set of instructions prepared to cover the studies with validation or development of different multivariable prediction models [109]. The Image Biomarker Standardization Initiative (IBSI) was founded to deliver a standard for calculating commonly used radiomics features (machine and deep) and the image-processing technique required earlier for radiomics features [100,110]. For clinical translation, standardization of methods of radiomics is a prerequisite. Apart from standardizing, detailing of quality of the radiomics study is also of utmost importance. Radiomics researchers should practice findability, accessibility, interoperability, and reusability (FAIR) ushering philosophy and ensure that the research objectives are findable, interoperable, accessible, and recyclable [111]. This will ensure validation and quality assurance of radiomics study.

What Have We Achieved So Far in Radiogenomics?
Radiogenomics can prove to be a beneficial tool for optimal patient selection in oncology [101]. It can act as a digital, non-invasive biopsy technique with the ability to identify and quantify tumor infiltration and helps with the development of personalized immunotherapy regimens and continuous monitoring of therapeutic response. The combination of imaging data with radiomics looks promising to improve disease diagnosis, prognosis, and the prediction of disease outcomes [101]. This field's progress can be well illustrated with the application of radiogenomics studies in numerous cancers such as glioblastoma, hepatocellular carcinoma, non-small cell lung cancer, hematopoietic tumors, etc. [101,102].
The dawn of radiogenomics imitates an alteration in research from the radiologypathology level to a genetic level [100]. Over the past decade, radiogenomics has experienced steady growth by mining radiomics, genetic, and clinical data [102]. The development of deep learning and big data programming is instrumental in radiogenomics research and contributes to the development of newer algorithms, workflow, and methods [112]. A significant achievement in radiogenomics is developing a fully automated system combined with a radiological workflow, shown in Figure 9 [113]. This reduces the overall time involved in performing repetitive and tedious tasks while improving efficiency and productivity [114,115]. Another advantage is the real-time monitoring of treatment by simultaneously comparing several images from the database [113,114].
The success of radiogenomics in developing personalized medication regimens is highly dependent on the reproducibility and transparency of the predictive tools and programming algorithms [116]. The availability of guidelines such as the TRIPOD has played a crucial role in progressing towards these goals [100]. At the same time, it is of utmost importance to see that applying these advanced radiogenomics methods accounts for the intricacies of the in-place radiobiology knowledge [116]. Imperfect and faulty datasets available in a radiogenomics database may be conjugated with prior knowledge of the outcomes to establish new conclusions [117].

What AI Offers: A Computational Perspective
Advancements of different estimation techniques, such as genomic sequencing and medical radiography of cancer, have enormously augmented the quantity of patient data accessible to the clinician perspective to radiogenomics. Indeed, AI, the advanced set of computational algorithms, is perfect for this and can easily deal in radiology from image acquisition, image reconstruction, feature extraction and selection, data analysis, and developing models to analyze cancer, treatment prognosis, follow-up planning, and many other aspects [8]. Figure 10 represents how AI improved the entire radiological workflow in current clinical practice. The reason for choosing AI in radiology (radiogenomics) is its excellent handling of a considerable proportion of data compared with the traditional statistics-based methods. AI-trained models recognize the data by analyzing patterns using different phenotypes and (or) genotype features. Further, these models can be used in predicting (estimating) unseen cohorts to check and validate the accuracy. Apart from classification or regression techniques in radiogenomics, AI could be used in several other applications such as cancer heterogeneity analysis, progression of tumors, recurrence, etc. [8]. AI improves the entire radiological workflow in three key ways: productivity, quantity, and precision. Productivity increases via automation and prioritizing routine jobs. In terms of quantity, it can extract and quantify information semi-automatically or fully automatically. For precision, by ensuring that the correct information is accessible, this is obtained by separating unnecessary information. AI, machine learning, and deep learning are interchangeable terms and create some confusion. Basically, AI provides broad ways of designing intelligent methods through radiological data mining that can efficiently and creatively address radiological problems. Under the umbrella of AI, numerous ML algorithms such as artificial neural networks, support vector machines, decision trees, random forest, and k-nearest neighbors have proven phenomenal. Again, neural networks have been working as parental concepts that range from very simpler to complex architectures, such as multilayer perceptron (MLP) and deep learning (DL). The following diagram depicts AI and its subsets in the prospect of radiological data analysis. The Venn diagram in Figure 11 presents the role of AI in its components in oncology care perspective to different applications in imaging and digital pathology, drug discovery, precision oncology, patient data management system, next-generation sequencing, etc. Machine learning, a subspace of AI that maneuvers the medical imaging features added with genomics features such as mutational status, could be used in several applications such as classification, regression, clustering, dimensionality reduction, and density estimation presented in the following Figure 12. These methods can be classified based on the type of learning it provides, such as supervised, unsupervised, and reinforcement learning; in the first one, which is most frequent in radiogenomics, the data are labeled prior to the training procedure. Further, these labels are also used as the reference standard to assess algorithm performance in the test cohort [119]. No prior labels are considered for unsupervised learning, and the algorithms/methods automatically cluster the given inputs based on specific characteristics [120]. In reinforcement learning, the algorithms/methods learn based on the non-stop response to their performance in the particular assigned task, or we can say that from its errors [120]. To conclude, all these methods may be united to augment forecast performance and analysis in the radiogenomics study of cancer, as shown in the following figure. Recent radiogenomics studies based on machine learning produced encouraging outcomes for cancer prognosis and treatment planning [8]. However, it is clearly observed the trend is shifting from machine learning to the end-to-end deep-learning model [121]. Deep learning, a subset of machine learning, consists of algorithms motivated by artificial neural networks. A neural network contains several layers with a number of nodes. First, there is an input radiological image or lesion (tumor or certain ROI), followed by several hidden layers. In the end, the output layer comprises the queries the network is assumed to respond such as tumor type classification, survival prediction, etc. The basic flow is given below in Figure 13.  Basically, a node at each layer, the output of the previous layers gets computed and further passed to the next layer. Specifically, training the deep neural network is nothing but figuring out the best output for an individual node, and when all the nodes are united, the deep model produces the correct response or outcomes. Radiogenomics studies [24,[122][123][124] based on deep learning produced a significant outcome. Therefore, it has been seen that AI including machine-and deep-learning perspectives to radiogenomics study of cancer plays a very significant role.

Cross-Validation: A Crucial Step
Cross-validation is a technique to measure the effectiveness of an AI-based model. In radiology, it has a very crucial role to make a generalizable model. Basically, it is a resampling technique that evaluates how a model will perform on the independent cohort. If an algorithm is not properly cross-validated then it gives a very biased results perspective of accuracy. In Table 2, a summary of the different types of cross-validation is presented. Table 2. AI-based model using cross-validation.

Leave one out cross-validation
An extreme type of CV that leaves one data sample out of the total data sample, then n − 1 samples are used to train the model and one sample is used as the validation set.
2 Hold-out cross-validation This is the usual train/test split of the dataset is a CV technique in which the dataset is arbitrarily partitioned into 2 parts of training and testing (validation).

k-fold cross-validation
In the k-fold cross-validation, the dataset is partitioned into k parts such that each time, one of the k parts is used as the training set and the other k − 1 subsets as the validation set.

Stratified k-fold cross-validation
It is a small variation of k-fold CV, in which each fold contains approximately the same strata of samples.

Nested cross-validation
Otherwise known as double cross-validation, in which k-fold cross-validation is employed within each fold of cross-validation often to tune the hyperparameters during model evaluation.

Performance Metrics: An Essential Step in the Evaluation of the AI Models
Accurate evaluation of the algorithm is a very important step. Table 3 includes all the metrics used for evaluation in different studies of radiogenomics in oncology.   Kaplan-Meier Curve It is the visual representation of the function that shows the probability of an event at a respective time interval.

Mean Absolute Error (MAE)
It is defined as the average of the difference between the ground truth and the predicted values by the regression model.
Mean Square Error (MSE) It is defined as the average of the squared difference between the target value and the predicted value by the regression model.

R 2 (R-Squared)
It is defined as the statistical measure of fit that indicates how much total variation of a dependent variable is explained by the independent variable by the regression model.

Unexplained variation Total variation
Where TP-true positive; TN-true negative; FP-false positive; FN-false negative; y i and y p i are the target variable and predicted values; N represents the total number of samples.

Is AI Efficient for Radiogenomics?
The application of AI in radiogenomics is highly promising [125]. The first step involved using AI at the detector level to process data for reconstructing images, which also includes corrections for scattering, attenuation, etc. [126]. Further use of AI includes processing images (fusion [48], segmentation [74], etc.). Finally, AI is used to generate models for personalized medicine based on information extracted from the images obtained from the database [126]. The quality, generalizability, and robustness of the algorithms/classification models will determine the radiomic and artificial intelligence [127].
Although AI and radiogenomics research is growing exponentially, clinical implementation is yet to be achieved. For improved implementation, models should be presented so that clinicians can understand results and interpret them adequately for carrying out appropriate treatment decisions. Models must be validated and well trained and, at the same time, must be transparent in providing risk information as per an individual's prediction [127].

AI in Radiogenomics Studies of Different Cancer Types
According to the World Health Organization (WHO) [128], brain, breast, and lung cancer, etc., is one of the leading causes of death in the global population with an average age of 70. In Table 4, the role of AI-based radiogenomics in different types of cancer studies is explained briefly in terms of the motivation of the research, radiomics and genomics information, AI-based imaging signatures for predicting the status of genomics, cohort information of radiogenomics, and different performance metrics involved with limitations and suggestions.
It is observed that both machine-and deep-learning imaging signatures have been used in nearly the same proportions for predicting the status of genetics information from the radiomics features provided. However, ML-based signatures have been used more, as discussed in statistical analysis in Section 2 along with Figure 3a. The application and cohort sizes play a premier role for the selection of AI models using machine-learning or deep-learning paradigms [129,130]. For supervised learning using smaller cohort sizes, augmentation is adapted during the training framework, while no augmentation is applied for the testing data. The ML-based imaging signatures provide the hand-crafted radiomics features based on shape, size, grade, tissue, texture, histogram, etc., whereas for those that are DL based, we have generated an automatic deep radiomics feature to predict the model behaviors for genetics information. Therefore, the DL-based imaging signature provides more automation power in comparison to the machine-learning models. As a result, the researchers are biased toward ML-based imaging signatures when they want to avoid more training time; also they have a lower amount of data with which to train the models. Additionally, deep learning is a data-hungry model that needs more radiogenomics data with more training time to provide more precisive results. The most fundamental challenge in the current research is the hands-on access to the radiogenomics datasets. It is recommended that such datasets be publicly available for the development of the advanced AI tools leading to product design for clinical settings. Hence, there is a trade-off between both technologies for predicting the status of genomics information in case of certain tumor. Under the machine-learning methodologies, the SVM, RF, DT, NB, XGBoost, Ensemble, Univariate, and multivariate regression models are used in more proportion compared to other ML or DL models.
The radiogenomics approach of diagnosis using artificial intelligence is gaining increased popularity in the case of more frequently occurring cancers such as breast, brain, and lung cancers. The frequently used genomics prediction using AI-paradigm for the cancer types, namely MGMT, IDH1/2, BRCA1/2, Lumina A/B, ER, PR, EGFR, Ki-67, and HER2, are shown in the table below. Again, due to a lack of sufficient genomics data availability, machine-learning approaches are preferred for these types of tumors. Since radiogenomics is more prevalent in medical sciences compared to engineering sciences, to avoid a larger delay in learning curve by the medical practitioners, AI tools such as machine learning and deep learning are likely to prove a foundational strategy for patient management [131,132]. The performance metrics generated by the AI-based imaging signatures such as accuracy and area under the curve (AUC) are mostly adopted by many authors as the standard parameters. However, other parameters such as sensitivity, specificity, precision, F1-score, and other statistical measurement tools can be used for further validation of the performances of the AI models.
By going through Table 4 (a-h) we can clearly observe the significant role of AI radiogenomics in different types of cancer in the era of precision medicine. Table 4 (a) discusses some recent AI-based studies in radiogenomics for breast oncology care. This cluster of studies basically focuses on the genomics status prediction of the most prominent genetic mutant in case of breast cancer such as BRCA1/2, Luminal A/B, HRE2, Ki67, ER, and PR. To predict the status of these relevant mutants, machine-learning models such as SVM, RF, DT, NB, XGBoost, Ensemble, Univariate, and multivariate regression models seem to be very promising, as of adaptation by several researchers. However, deep-learning models have been used in some scenarios. To implement the radiogenomics aspects in case of breast cancer care involves a few popular datasets such as The Cancer Imaging Archive (TCIA), The Cancer Genome Atlas (TCGA), and Full-Field Digital Mammography (FFDM). The standard performance measure metric used to check the imaging signature is area under the curve (AUC). As per the studies considered, the typical performance of these models for the above said genomics is near~60% and~80% in all cases. Table 4 (b) focuses on brain oncology care using AI-based imaging signatures. Some more promising studies predicting the brain cancer mutant are considered here. The frequently used mutants are IDH1, IDH2, MGMT, EGFR, PTEN, PDGFRA, CDKN2A, TP53, and RB1. It can be concluded from the considered radiogenomics studies that the most common type of genetic alterations of brain cancer are IDH1, IDH2, and MGMT. To predict the status of these relevant types of genomics, the relevant imaging signatures found are a mixture of both machine and deep learning. However, ML-based imaging signatures such as logistic regression, XGBoost, random forest, and decision tree are predominantly used compared to DL-based models. The common cohort available for the radiogenomics of brain cancer using AI-paradigms are The Cancer Imaging Archive (TCIA), The Cancer Genome Atlas (TCGA), and some personalized data from multicenter hospital environments. The performance metrics for measuring the model performances are under the curve (AUC) along with accuracy, sensitivity, specificity, precision, and f1-score. The standard AUC for these studies is near~85%. The accuracy parameters values are near~85%.
Lung cancer is the most observed cancer type and has been discussed in Table 4 (c). The genetic mutation of genes such as EGFR, KRAS, TP53, and a few RNA sequencings is considered to be the most frequently associated. To predict the genomics status, the most popular AI-based imaging signatures include a mixture of both ML and DL. This includes models such as CNN, 3DCNN, SVM, random forest, and generalized ML-linear models. The common type of database for the radiogenomics of lung cancer include The Cancer Imaging Archives (TCIA) and other multi-institutional databases. The AUC is the most observed performance metric with a mean value of~80% from the cluster of models considered under this category.
Further, Table 4 (d) discusses some recent AI-based models in radiogenomics for liver oncology care. The motivation for this category of cancer diagnosis using radiogenomics is the prediction of early recurrence of Hepatocellular carcinoma (HCC) and the survival prediction involved in it. The genomics alteration for the HCC includes TP53, TOP2A, CTNNB1, CDKN2A, AKT1, alpha-fetoprotein, AFP, and DCP. Again, the machine learning imaging signature is considered to be the preferred one, with the standard mean AUC observed as~85%. However, other statistical measures of Kaplan-Meier analysis and Cox regression are also involved in the measurement process.
The recent AI-based models in radiogenomics for prostate oncology care have been discussed in Table 4 (e). The common type of diagnosis for prostate cancer involves the prediction of tumor aggressiveness in the prostate. Machine-and deep-learning models are used for this purpose with standard performance metrics AUC and accuracy. In these studies, authors focused on deep-learning-based models such as CNN, Resnet 101, and LSTM for building radiogenomics-based models and obtained very promising AUC that is more than 0.9. However, to make more generalizable and robust models, the use of multi-institutional cohort is highly recommended for future prospects.
Table 4 (f) discusses the involvement of an AI imaging signature for studies in radiogenomics for ovarian oncology care. The major task under this category of care involves the prediction of PFS in advanced HGSOC and PM in ovarian cancer. Here, mainly machinelearning models such as KNN, SVM, Logistic regression, and ensemble-based learning have been taken into consideration for building models with very promising AUCs around 85%. From a database point of view, even though The Cancer Imaging Archive (TCIA) is the common database, multi-institutional databases are also used for this purpose. For future prospects, the use of multiple multiparametric scans is highly recommended instead of using only single modality for ovarian-based radiogenomics studies. Table 4 (g) discusses the oncology studies of collateral with radiogenomics using AI imaging signatures. Prediction of KRAS mutations is the measure of genomics associated with this type of cancer. The other mutants involved here are NRAS, BRAF, and AF. To determine genomic mutational status, machine-learning models such as SVM, Naïve bayes classifier, decision tree, and RELIEF have been implemented. However, they obtained AUC in the majority of the studies, which is more than 85%. Various public medical institute databases and TCIA have been used for building predictive models. An increase in the testing cohort size is highly recommended here to make a robust imaging (genomic) signature. Table 4 (h) narrates some recent AI-based imaging signatures in radiogenomics of gastric oncology care. Predicting lymph node metastasis and prediction of PD-L1, and PM status in gastric cancer (GC) is the major objective under radiogenomics using AI. The machine-learning paradigms such as SVM, decision tree, random forest, and multivariate logistic regression are dominating in predicting the status of genomics over deep-learning models. Traditional machine-learning features such as intensity, first-and second-order statistics are very promising to analyze imaging phenotypes in gastric cancer with impressive AUC more than 75% for the testing cohort.
Though radiogenomics studies of cancer using AI-paradigms have several key benefits, there are certain challenges (described in the next section) that demonstrate why AI in radiogenomics is a bit concerning for oncologists to use frequently in current clinical practice.      Shows a direction towards preoperative estimation in early prediction of recurrence less than 1 year and helps radiologists with better treatment planning. [147] Diagnosis in HCC

Benchmarking: Comparison between Different Radiogenomics Reviews
We have observed that recent progress in AI and genomic sequencing of cancers provided new hope in radiogenomics study in individualized and precision medicine. For this reason, it has drawn significant attention from researchers and scientists. Recently, some reviews based on radiogenomics have been performed by different groups around the world. The benchmarking of the proposed review with existing work is provided in Table 5. Here, benchmarking was performed based on four components: the first includes humans' different anatomical cancers; the second covers the different aspects of AI, such as machine learning and deep learning, cross-validation, and performance metrices; the third covers radiogenomics aspects such as conventional and deep radiomics and genotypes, and the last component covers the cohort description. The main aim of the proposed review is to cover all the aspects of radiogenomics, including the brief fundamentals of AI, and its offers with different genotypes of multiple cancers of the human anatomy. It also provides artificial intelligence's achievements and challenges in current clinical practice. Therefore, this review reaches a different height in comparison with other studies.

Clinical Challenges and A View for the Future
The potential of radiogenomics in tumor diagnosis, prognosis, and prediction is immense; however, translations into clinical settings are slow due to several associated challenges [175]. Adopting radiogenomics practices into clinical settings needs to overcome these significant challenges. The biggest challenge is the storage, management, extraction, analysis, integration, visualization, and communication of the information generated from the myriad of available data [176]. Integrating such heterogeneous and multifactorial data in a cost-effective, standardized, and secure manner is essential. Initiatives such as the Cancer Research UK's Stratified Medicine Program (Cancer Research UK, 2013) and the Center for Advancing Translational Science under the National Institutes of Health (NIH, 2011) have been started for better management of radiomics research in oncology [176].
The nature and variability of data are also critical standing challenges. Although vast imaging data are readily available, institutional heterogeneity (either inter-or intra-) exists because of the differences in scan protocols, hardware, and post-processing steps, thereby limiting generalizing findings [176]. Differences in image acquisition parameters, arguments, and distinctions in contrast enhancement protocols exist [127,177]. A study reported that even if the same scanning protocol was utilized for image acquisition, it still resulted in differences in radiomic feature calculations [178]. This leads to reduced reproducibility of results and impedes the development of appropriate radiogenomics models [178]. Therefore, it is necessary to implement certain standard practice guidelines to ensure the reliability and accuracy of radiogenomics studies [100].
Data availability from genetic tests is still limited and carrying out large-scale genetic testing may not be cost-effective, in addition to being challenging. The use of genetic and imaging repositories may provide a cost-effective solution [172]. Current data generated using radiogenomics are from retrospective studies with a small patient cohort; therefore, a conclusion is usually limited and cannot be generalized, warranting more extensive prospective studies [99]. Inadequacy in data stratification may result from the lack of the required volume of data leading to compromised data adaptation, optimization, and evaluation [100]. Limited-size datasets pose a high risk of overfitting models, leading to poor generalization. This can lead to incompetent decision-making with high false-positive examination rates from multicenter with different devices and imaging protocols [172,179,180]. In clinical oncology practices, it is also required to have the availability of quantitative descriptors with interpretability. This will enable a better investigation to address the heterogeneity of tumors [100,181].
A significant task involved in radiogenomics is the interpretation of the algorithms, which are highly complex, and interpretation of their inner workings is not easy; it is referred to as 'black box' nature [182]. This hinders the acceptance of such technology in healthcare. An easy-to-explain algorithm allows evaluation of its outputs and provides feedback for improvement. These algorithms are highly dependent on the standards available for interpreting data, which, although highly relevant, may also serve as a source for bias [183]. In various instances, outcomes of these algorithms have been proven to be more reproducible and consistent than human readings, but this induces more patient examinations and results in overdiagnosis [182]. At the same time, personalized management decisions recommended by complex algorithms may be difficult to explain, and errors and biases may become harder to detect [184].
Another significant challenge associated with radiogenomics research is the limited number of laboratories conducting such research due to the cost and challenges related to the same [172]. Further, laboratory certifications and personnel expertise are required to make the process cumbersome. Additionally, in most cases, molecular and genetic analysis usually takes place outside hospital settings, making data integration a herculean task due to the application of different genome sequencing technologies by commercial platforms [172]. This culminates in having limited imaging and genomic datasets, limiting the expansion of radiogenomics approaches.
Though AI in radiogenomics has accomplished a great deal of development in oncology, as mentioned in several studies, still there is a long way to go until oncologists regularly use it in clinics. Apart from the challenges discussed above, a significant challenge is the effective organization and preprocessing of the multi-institutional cohort of large-scale data. Though handling the multi-intuitional data is a challenging process in terms of processing, costs, and ethical clearance procedure of different institutions. However, if it is achieved intelligently then it would make the radiogenomics study the best and most clinically reliable. For example, if-due to some ethical issue-institutions cannot share their data, in this case, they can share their developed AI models and conduct tests on their cohort, and researchers can combine the models effectively and conduct further analysis. Therefore, researchers could perform their study with more robust and generalizable results. Additionally, one crucial concern of the radiogenomics study is that properly nested cross-validation must be performed to avoid overfitting, which is the common case in AI. Many studies have shown very high accuracy for cancer's perspective, ignoring the fact that these cancers are heterogeneous, and it is not easy to learn the exact imaging phenotype by the AI model. By the way, from the future perspective, AI will become an essential tool in the radiogenomics of cancer if the challenges discussed are handled appropriately.

Conclusions
Nevertheless, in recent years, AI in radiogenomics has presented novel solutions to the current clinical challenges for treating cancers and has shown promising outcomes for personalized prognosis and treatment planning. As discussed, it is applied tremendously in various studies of cancers such as survival prediction, progression-free survival, cancer heterogeneity analysis, etc., in the era of precision medicine. However, we have noticed that certain studies have been conducted with low amounts and a lack of (i) multi-institutional data, (ii) proper cross-validation analysis, (iii) generalizable results, and (iv) robustness, thereby posing more challenges and shaking the oncologists' confidence regarding its use in regular clinical practice. In the future, we suggest that further studies will emphasize eliminating the current limitations of AI in radiogenomics and making their AI methods more efficient for clinical purposes. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.