Predicting Survival in Patients with Brain Tumors: Current State-of-the-Art of AI Methods Applied to MRI

Given growing clinical needs, in recent years Artificial Intelligence (AI) techniques have increasingly been used to define the best approaches for survival assessment and prediction in patients with brain tumors. Advances in computational resources, and the collection of (mainly) public databases, have promoted this rapid development. This narrative review of the current state-of-the-art aimed to survey current applications of AI in predicting survival in patients with brain tumors, with a focus on Magnetic Resonance Imaging (MRI). An extensive search was performed on PubMed and Google Scholar using a Boolean research query based on MeSH terms and restricting the search to the period between 2012 and 2022. Fifty studies were selected, mainly based on Machine Learning (ML), Deep Learning (DL), radiomics-based methods, and methods that exploit traditional imaging techniques for survival assessment. In addition, we focused on two distinct tasks related to survival assessment: the first on the classification of subjects into survival classes (short and long-term or eventually short, mid and long-term) to stratify patients in distinct groups. The second focused on quantification, in days or months, of the individual survival interval. Our survey showed excellent state-of-the-art methods for the first, with accuracy up to ∼98%. The latter task appears to be the most challenging, but state-of-the-art techniques showed promising results, albeit with limitations, with C-Index up to ∼0.91. In conclusion, according to the specific task, the available computational methods perform differently, and the choice of the best one to use is non-univocal and dependent on many aspects. Unequivocally, the use of features derived from quantitative imaging has been shown to be advantageous for AI applications, including survival prediction. This evidence from the literature motivates further research in the field of AI-powered methods for survival prediction in patients with brain tumors, in particular, using the wealth of information provided by quantitative MRI techniques.


Introduction
Artificial intelligence (AI) is a branch of computer science that has been successfully applied to the analysis and extraction of meaningful features from medical images, with various clinical applications [1]. In particular, the use of AI in brain imaging has been fruitful, showing promising results and generating new perspectives for diagnosis, prognosis and treatment planning [2][3][4][5].
Brain tumors are among the top ten causes of death from cancer [6,7] and can be metastatic or primary. Gliomas account for about 80% of primary malignant brain tumors; they include different sub-types of which the most common are glioblastoma (GBM), astrocytoma, oligodendroglioma, and ependymoma [8,9]. Some subtype-specific characteristics, such as cell invasion and proliferation, angiogenesis, apoptosis , and the high degree of heterogeneity contribute to both increased morbidity and mortality [10]. Among gliomas, GBM is the most aggressive and heterogeneous (at tissue, cellular and molecular level), with the highest short-term mortality rate [11,12]. Currently, the average 5-year survival rate for GBM is 5.6-7%, while the median survival is about 12-15 months [13][14][15][16]. Despite aggressive management with surgery, radiotherapy, and chemotherapy, overall patient survival (OS) remains dismal [15,[17][18][19]. Magnetic Resonance Imaging (MRI) is the modality of choice in neuro-oncology for diagnosis, treatment response evaluation and prognosis prediction; since it is non-invasive and can convey a considerable amount of information about both the tumor and the surrounding areas [20,21]. Several MR techniques are routinely used to image brain tumors, aiding management from diagnosis to therapy assessment [22][23][24], and many novel techniques are in active development [20,25], however current imaging is insufficient.
Partially prompted by this unmet need, recent years have seen an increasing interest in applying AI techniques to MRI. Great emphasis has been placed on radiomics, a technique which aims to extract quantitative and reproducible features from images, including complex patterns that are often not visible to the human eye [26,27]. Specifically, radiomics refers to high-throughput extraction of quantitative features, that result in the conversion of images into mineable data, and the subsequent analysis of these data for decision support [28]. This technique has been applied to several imaging modalities including Ultrasound (US) [29], Computed Tomography (CT) [30] and MRI [31]. These approaches have been primarily applied to oncology, although there has been a growing interest also in cardiovascular applications [30,[32][33][34]. Through the study of quantitative features extracted from MR images, by computing local macro-and micro-scale morphological changes in texture patterns, radiomics can accurately reflect the underlying pathophysiology of the disease by capturing statistical inter-relationships between voxels under examination [6,11,[35][36][37][38][39].
A growing body of evidence suggests that radiomic analysis of MR images, can aid OS prediction, while also influencing patient management [40,41]. Therefore, those "surrogate" predictors of patient survival are of fundamental clinical interest: in particular, prediction of OS and survival classification (SC) in groups (long-term and short-term survival-survival stratification), as they would be of utmost importance in treatment evaluation, and follow-up management [40][41][42][43]. Different methods based on Machine Learning (ML) and Deep Learning (DL), and related algorithms, have been proposed to address this need for assessing survival. Traditional ML-based methods, such as support vector machines (SVMs), k-nearest neighbors algorithm (k-NN), and random forests (RFs) are generally utilized for brain tumor survival analysis. However, these ML-based methods have the common limitation of hand-crafted feature extraction [44]. DL-based methods overcome the drawback of hand-crafted feature extraction [45,46], having the ability to learn and self-determine the best features to use in a prediction model [47]. These DL methods, based on completely different approaches, have shown different performance in SC and prediction of OS. Despite the advantages and disadvantages of both ML and DL methods, establishing which of one is better is not possible, since performance of the various algorithms may vary depending on the specific task (OS or SC) and on the composition and quality of the dataset. The focus of this narrative review is to explore currently-published AI techniques applied to MRI for OS prediction and long/short term SC in patients with brain tumors. Several methods that have been proposed over the last 10 years, have been investigated. In the results section, ML and DL algorithms are presented in order of their performance.
This manuscript is organized as follows. Section 2 describes the methodology adopted in the literature review. The selected papers are briefly described in Section 3. Discussions and final remarks are presented in Section 4.

Literature Review
A literature review was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines ( Figure 1). PubMed and Google Scholar databases were searched to identify all potentially relevant studies back to 1 January 2012. The search query was built using medical subject headings (MeSH) related to AI and brain. The following search query was used on both databases, restricted to original articles published between 2012-2022: ("Machine Learning" OR "artificial intelligence" OR "Deep Learning") AND brain AND (tumor OR tumour) AND (survival OR "life expectancy") AND (pediatric OR paediatric OR adults) AND (MRI OR "magnetic resonance") All studies evaluating AI and ML models for survival prediction in patients with brain tumors were included in this study. The initial search returned 1889 results (59 from PubMed, 1830 from Google Scholar), with a significant imbalance of results from Google Scholar. Following manual elimination of duplicates, titles were carefully screened to identify relevant papers. Any work that matched at least one of the following exclusion criteria was excluded: • no full-text available; • no AI application or non-pertinent application; • conference proceedings; • books or book chapters; • non-English manuscripts.
Review of the titles narrowed the results to 144 articles, 59 papers from PubMed and 85 from Google Scholar. Subsequently, review of the abstracts further narrowed the results to 88 articles, 35 papers from PubMed and 53 from Google Scholar. Despite review of titles and abstracts, not all the articles found on Google Scholar were relevant, moreover, some were not indexed in PubMed, and, considering the target audience and the push towards translational applications, we decided to include in this survey only indexed papers. Hence, after final revision, 50 papers (24 from PubMed, 26 from Google Scholar) were deemed eligible and included in this review.

Metrics
Several metrics can be used to evaluate a model, the most popular and well known are accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC), illustrating the diagnostic ability of a binary classifier system as its cut-off value varies. When estimating the goodness of a model that predicts survival, using the concordance index (C-Index) as an assessment metric may be useful. To account for the heterogeneity of the methods among the selected papers, C-Index and accuracy were used as comparison metrics for the SC and OS tasks respectively.

Results
Recent years have seen an increasing interest in using AI applications for survival prediction and risk stratification. Figure 2 shows the number of papers included in this review according to their publishing year. A variety of methods have been proposed over the past decade with progressive developments leading to current state-of-the-art methods. Older methods were based on clinical [48,49], pathological [50,51] and imaging [30,52] biomarkers, those were gradually refined with the deployment of ML and DL techniques. In this review, ML and DL algorithms are chronologically presented. Subsequently, the performance of the best performing algorithms is briefly discussed.

Years: 2012-2016
A series of studies focused on defining paradigms that show the potential combination of clinical and computer-aided methods. These pioneering studies laid the theoretical foundation for subsequent research based on ML and DL methods. Most of these studies focused on glioma and, in particular, GBM. However, some results can be generalised across the wide spectrum of brain tumors.
Zacharaki et al. [53] showed how predictive models, based on data mining algorithms of imaging features, provide more accurate prognostic predictions than traditional histopathological classification alone. Macyszyn et al. [54] showed how imaging patterns, analysed with with ML techniques, could provide useful information for survival prediction. Oermann et al. [55] developed an Artificial Neural Network for OS prediction that outperformed traditional statistical tools and scoring indices for individual patient prognosis prediction. Kickingereder et al. [56], in a pioneering study of radiomic profiling of GBM, identified survival-related imaging predictors that performed better than clinical and risk models. Radiomic signatures based on MRI images provided a benefit in OS prediction and risk stratification. The authors designed a Cox proportional hazard (CPH) model using radiomic features previously selected using a dimensional reduction technique. Comparing the use of radiomic features either alone or in conjunction with clinical information, they found a C-index of 0.696.
Emblem et al. [57] showed the usefulness of a SVM for SC. This model, developed particularly to perform early survival prediction in patients with glioma, showed that relative cerebral blood volume (rCBV) was the most significant imaging parameter for survival prediction. Gutman et al. [21] showed that image features, such as lesion size and enhancement after gadolinium-based contrast agent (GBCA), correlate with OS. In a pioneering work, Jain et al. [58] demonstrated a correlation between morphological features and haemodynamic parameters. This study focused on the non-enhancing tumor component and showed a significant correlation between the rCBV in the non-enhancing tumor region (NER) and the lack of epidermal growth factor receptor (EGFR) mutation.
Some studies [59,60] proposed the use of shape features extracted from the presumednecrotic area or algorithmically assessed shape features to improve survival prediction. Liu et al. [61] proposed the use of a functional and structural brain network, by integrating information from Diffusion Tensor Imaging (DTI) and functional MRI (fMRI). Survival estimation and the subsequent subdivision into risk classes was improved by using those quantities which provided complementary information. This led to a classification accuracy of 75.0%, significantly higher than the accuracy provided by clinical information alone (accuracy = 63.2%).
These papers served as a prompt for the scientific community, and also demonstrated the potential utility of using additional information (such as radiomic, genomic, and histopathologic data [62,63]) in the predictive models rather than clinical information alone.

Years: 2016-2018
Research gained momentum and, while still focusing on ML and radiomics, as DL became more accessible, an increasing number of studies explored its applications in medical imaging. Several clinical, functional, radiomic and morphological biomarkers were increasingly being included in the portfolio of information used to predict survival, and found to provide added value.
Kim et al. [42] emphasized the importance of the Apparent Diffusion Coefficient (ADC) as a survival predictor biomarker, demonstrating that its significant correlation with survival. Sanghani et al. [64] showed how OS prediction is improved by using different types of radiomic features (volumetric, shape and texture) from multi-parametric MRI. They used an SVM classifier set up for stratification into 2 and 3 survival classes (short-, long-survival groups and short-, mid-, long-survival groups). The stratified 5-fold crossvalidation accuracy obtained for the 2-class classifier was 97.5%, while that for the 3-class classifier was 87.1%.
Several studies highlighted that integrating multi-modal imaging and radiomic phenotyping was beneficial for OS prediction [31,56,65,66]. Bae [65] showed that the combination of multi-modal images was decisive for improving OS prediction; moreover, among all the considered features, those derived from MRI were the most predictive and, therefore, relevant. Particularly, for OS prediction, this model showed a C-Index of 0.61. A C-index of 0.71 was obtained by including the remaining features set and clinical information in the model. Kickingereder et al. [67] obtained a C-Index of 0.77 using a radiomic signature composed of 8 features and a Cox regression model of the least absolute shrinkage and selection operator (LASSO) penalized type. A subsequent study performed by Prasanna et al. [35] showed that the use of features from peritumoral brain parenchyma could also aid in predicting long versus short term survival. In this case, 10 features from peritumoral regions were found to be predictive, when compared with features from enhancing tumors, necrotic regions and clinical characteristics. The combination of clinical and radiomic features generated a model with C-index of 0.734, improving GBM survival prediction. Another study conducted by Kim et al. [68] showed preliminary evidence that in peritumoral NER, fractional anisotropy (FA) and normalized rCBV (nCBV) features could have improved OS prediction. OS was estimated analysing radiomic features extracted in the NER. The model, combining nCBV and FA performed better than single image/methods radiomic models and obtained a C-Index of 0.87. However, given the nature of these retrospective studies and the small sample size, the generalizability and statistical power of this data may be limited.
A few studies [36,37] explored the effect of heterogeneity on survival stratification highlighting that the distribution of heterogeneity within the tumor was a determining parameter for correct classification (the classification accuracy was in the range 78.2-80.7% [min-max]). A pioneering study [11] identified spatial image features from tumor habitats and subregions that were associated with survival time. In particular, spatial features extracted from tumor habitats were effective in predicting survival [11,69]. Two databases of GBM images were used. The model with habitat-based features for survival prediction showed promising accuracy in both databases (86.7-87.5%). Suter et al. [70] emphasise that the use of robust radiomic features could benefit the generalizability of the model, specific to OS, especially for single centre based models applied to unseen multi-centre datasets. A separate study [38] evaluated the impact of brain functional networks on OS achieving an accuracy of 86.8% using resting state fMRI (rs-fMRI) derived information. This study was based on the hypothesis that connectomics-based features could capture tumor-induced network level alterations that can influence prognosis, underlining the importance of including rs-fMRI in the pre-surgical workout of patients with glioma. Nematollahi et al. [71] showed the impact of a decision tree trained using both clinical and radiomic features. They obtained an accuracy of 90.9% for OS classification, using the C5.0 decision tree algorithm.
Preliminary results of DL applications were also beginning to emerge. Nie et al. [43] developed a DL framework for automatic extraction of features from multimodal MR images (T 1 -weighted [T 1 w] imaging, fMRI, DTI), combining elements of both DL (deeply learned features) and traditional ML (handcrafted features), using a tri-dimensional convolutional neural network (3D-CNN) and generating a new network architecture for using multichannel data and learning supervised features. In the long versus short time classification task (SC), i.e., dichotomous classification, the model achieved an accuracy of 89.9%. The authors particularly stressed how relevant were the features learned from DTI and fMRI.

Years: 2019-2020
Between 2019 and 2020, the focus shifted away from ML to pivot on DL, hybrid techniques (e.g., mixed DL + ML techniques) and CNNs, which have become one of the reference paradigms. Several studies suggested that DL-based survival prediction can outperform ML-based ones. In particular, non-linear DL methods may be useful in survival studies [72].
Way et al. [73] identified a correlation between volumetric DL features and OS. Zadeh et al. [74] developed a CNN for SC based on histopathology (DeepSurvNet) able to classify patients into 4 distinct survival classes. DeepSurvNet achieved an accuracy of 80% on blind data. Furthermore, through the analysis of mutation frequencies, DeepSurvNet was able to capture the genetic differences between the various survival classes. The use of histopathological images was therefore beneficial for SC. Nie et al. [75] used a multichannel 3D-CNN with multimodal images. Following feature extraction by DL methods, features were entered into an SVM for SC (long versus short survival). The model reported an accuracy of 90.7%.
A variety of authors have also employed hybrid techniques [44,45,[76][77][78][79][80][81], often based on ML and DL. Some authors [81] have shown that by adding genomic information, the predictive accuracy would significantly increase (in this study the mean root mean square error (RMSE) was found to be reduced by 84 days compared to the use of a CNN based only on single mode MR images). Others experimented a model based on a neural network [79] to categorise survival into two classes and provide OS in days, showing inferior performance (accuracy = 0.59), therefore, indirectly justifying the use of a deep neural network, such as CNNs.
Recent studies [77] have also highlighted the usefulness of radiogenomics for OS prediction in days. A hypercolumn-based CNN was employed for segmentation, features extraction and combination of imaging-derived biomarkers with gene expression. The radiogenomic model performed best when compared to the performance of individual models with genomic and radiomic information. Of particular relevance, Zhang et al. [76] used a mixed technique to identify high-risk sub-regions within a lesion that may influence survival. In brief, K-means clustering was used for initial identification of sub-regions of interest (294 total); subsequently, a multiple-instance learning (MIL) model was used for risk stratification. The performance of high-risk regions in survival stratification showed an accuracy of 87.9%, higher than a model built using radiomic features extracted from the gross tumor region (70.19%). Different authors [44,45] focused on the impact of radiomic features on the DL model. Feng et al. [45] developed a 3D-U-Net designed to perform segmentation (since the features were designed for a segmentation task and then repurposed for a different one, accuracy in classification was not high). They used a multivariate linear regression model to minimize over fitting, although at the cost of its expressiveness. Nevertheless, the authors won the OS subtask competition at the Medical Image Computing and Computer Assisted Intervention Society Brain Tumor Segmentation (MICCAI BraTS) 2018 challenge. This paper proved the feasibility of using features not linked to a specific task. Han et al. [44], developed a mixed technique (ML + DL) able to classify patients into longand short-term survivors with a log-rank test P value < 0.001.
Numerous studies [42,58,81,82] agreed on the importance of using features derived from Perfusion-Weighted Imaging (PWI) and Diffusion-Weighted Imaging (DWI). Petrova et al. [82] identified features related to ADC and rCBV parameters as possible OS predictors. In this study features were also ranked according to their importance; between ADC and rCBV features, the most important were: 95th percentile values for ADC (ADC_95), standard deviation of rCBV (rCBV_std), standard deviation of ADC (ADC_std), and median of rCBV (rCBV_median). Sun et al. [83] presented a DL-based framework for brain tumor segmentation and survival prediction in glioma, using multimodal MRI scans. Ensembles of three different 3D CNN architectures for robust performance through a majority rule were used for tumor segmentation. For survival prediction, 4524 radiomic features were extracted from segmented tumor regions, then, a decision tree and cross validation were used to select relevant features. Finally, a random forest model was trained to predict OS. This method ranked 5th at the MICCAI BraTS 2018, with 61.0% accuracy for classification in short-, mid-or long-survivors.
Several studies [59,73,78,[84][85][86][87][88] have shown a potential correlation of radiomic features extracted from MRI with OS, which is emerging to be helpful in predicting GBM OS. Lu et al. [87] developed a ML model for predicting OS in GBM, based on the use of radiomic, clinical and semantic features, the latter based on the Visually AccesSAble Rembrandt Images (VASARI) feature scoring system. This study, based on contrast-enhanced T 1weighted (CE-T 1 w) imaging, showed excellent performance for OS prediction. 333 radiomic features and 16 semantic features (VASARI) were extracted; following the selection and ranking of radiomic features, together with semantic and clinical features, the authors built a ML model aimed primarily at predicting MGMT promoter methylation status. MGMT methylation was used with the previously determined set of features (radiomic, clinical and semantic) to build a second model to predict OS. Both a CPH regression model and a RSF model were tested. The RSF model had the best performance, with C-Index of 0.91.

Years: 2021-2022
More recently, we observed a consolidation in DL applications such as CNN and, often, hybrid methods consisting of ML and radiomics.
Chen et al. [89] hypothesised that combining dose-volume histogram (DVH) and clinical features into a single model could lead to better performance than using clinical features alone. Thus, they developed an ML-based model integrating clinical and dose volume histograms parameters demonstrating that this integration can improve risk modelling. They also compared the performance of RSF and CPH to individuate the best classifier. RSF performed better on the testing set, with a C-Index of 0.85. The RSF-based model obtained AUC values of 0.91, 0.88 and 0.84 in predicting survival at 1, 2, and 3 years respectively, therefore showing good predictive accuracy. Gross tumor volume (GTV) and D99 (also called the near-minimum absorbed dose, represents the dose that covers 99% of the target volume) features were also identified as potential new diagnostic biomarkers, in addition to presence of IDH 1 mutation, Karnofsky performance status (KPS) and smoking status. Rathore et al. [90] showed that combining in a classifier both MRI, radiomic and histopathological imaging features can be beneficial for OS prediction, compared to the performance of classifiers based on a single feature type (MRI, radiomic or histopathological) only. The accuracy in predicting survival in groups was 0.86, while the C-Index was 0.79.
Huang et al. [40] developed a method that allows prediction of survival with random forest regressors. A V-Net was used for feature extraction, mainly focused on segmentation tasks. This project presented an integrated framework between segmentation and survival prediction. It also achieved an average RMSE of 311.5 for survival prediction, outperforming other methods proposed by other participants during the BraTS competition. Wang et al. [91] developed a radiomic signature as pre-treatment predictor of OS. A radiomic signature derived from CE-T 1 w and FLAIR sequences, showed better prognostic performance than signature derived from either individual imaging techniques, and obtained a C-Index of 0.798, out-performing the use of clinical and pathological information only, which obtained a C-Index of 0.675. According to those results, the radiomic signature may help to identify patients who would benefit from chemotherapy. The study identified patients with low grade glioma (LGG) who may have worse survival, and, thanks to the radiomic signature, they selected patients who may benefit the most from temozolomide (TMZ).
Although slightly inferior in performance, the approach published by Preetha et al. [39], may have a potentially significant clinical impact by reducing the necessity of contrast administration for serial scans. The authors generated post-contrast T 1 w synthetic MR images from pre-contrast T 1 w MR images using a deep CNN (dCNN). The quantification of the contrast-enhancing area from post-contrast synthetic T 1 w MRI allowed assessment of the patient's response to treatment without any significant difference from the true post-contrast T 1 w sequences obtained after GBCA administration. These promising results could promote the application of dCNN in radiology to potentially reduce the need for GBCA administration. The authors did not observe any significant difference in OS estimated using the original or the synthetically obtained images. The synthetic images showed a C-Index of 0.667, while the original images show a C-Index of 0.673.
Various authors focused on the combined use of ML and radiomic techniques for OS prediction in brain tumors [92,93]. Chato et al. [92] focused on GBM, while Grist et al. [93] focused on paediatric brain tumors. The latter combined multi-site MRI with ML methods to predict survival in paediatric brain tumors, with the aim of stratifying patients to low and high risk cohorts. In Grist et al. [93], patients underwent PWI and DWI at the time of diagnosis. After conventional post-processing, a semi-supervised Bayesian survival analysis was performed. Unsupervised and supervised ML were then performed to determine sub-groups with different survival, and assess subsequent classification accuracy. A combination of DWI and PWI was able to determine two sub-groups of brain tumors with different survival characteristics. Kaplan-Meier analysis of high-grade tumors in the high and low risk clusters revealed a significant difference in survival characteristics (p < 0.05), which were subsequently classified with high accuracy (98%) by a single layer Neural Network, after stratified ten-fold cross validation. The same task with a logistic regressor generated an accuracy of 90%, indicating that the neural network was more suitable for this type of task. The model-relevant features, were: uncorrected Cerebral Blood Volume (uCBV), Region of Interest (ROI) mean, a vascular leakage parameter (K2) ROI mean, uCBV whole brain mean, tumor volume, and ADC ROI kurtosis. Tumor perfusion measures were found to be of high importance in determining survival. The authors also state that perfusion was so relevant for classification purposes that it should be included in clinical imaging protocols. A particular strength of this work was that it was performed on multi-site, multi-scanner data.

Overall Considerations
The main limitations we identified in the reviewed papers are related to the improper use of techniques and algorithms (often not state-of-the-art), the use of limited or suboptimal datasets (e.g., different image acquisition parameters, lack of standardization protocols or incomplete information) and the use of retrospective cohorts . The combination of these factors had a major impact on the performance of the different methods. Table 1 summarises characteristics and performance of the best four methods for OS prediction. Among those, the C-Index fluctuates from 0.79 [90] to 0.91 [87], with the best performance obtained by an RSF. A graphical representation of the C-Index of the best four methods is displayed in Figure 3.

Performance
For the SC task, Single Layer Neural Network [93], SVM [64], Decision Trees [71], and 3D-CNN plus a SVM [75] achieved the highest accuracy (>90.7%). Characteristics and performance of those four methods are summarised in Table 2, a graphical representation of the accuracy of the best four methods is displayed in Figure 4. OS (C-Index)

Discussion
We presented a general overview of the current literature on AI applications in predicting survival in patients with brain tumors, based on MRI. The use of AI-based techniques, such as ML and DL, appears beneficial to predict survival. After evaluating several applications, we ranked the best applications based on the performance of the different algorithms for the two tasks of interest (OS and SC). Different approaches showed high performance, and the choice of the best one to use is non-univocal and subject to different variables. Unequivocally, the use of features derived from PWI and DWI/DTI were of significant relevance for both tasks. Indeed, the use of quantitative imaging is undoubtedly advantageous for AI applications. This is of particular relevance in an era when fully-quantitative MR imaging methods are becoming increasingly available and proven to be reproducible across different vendors [94].
The use of semantic features in addition to clinical and radiomic features proved of significant relevance for the OS task, similarly to the use of features from clinicopathological information. The use of multiparametric MR images, compared to unimodal ones, also leads to significant improvements. ML methods appeared to perform better for this task. The best four algorithms have C-Index values in the range 0.79-0.91. The best algorithms were ML methods employing: radiomic, clinical and semantic features [87], PWI and DTI features [68], clinical features and DVH features [89], multimodal imaging and histopathological information [90]. For SC, the best performance was shown by: a Single Layer Neural network used in conjunction with PWI and DWI features, and a SVM classifier trained with volumetric, texture and shape features extracted from multimodal MR images. Both achieved accuracy in the order of 98%. The methods using CNN and decision trees performed slightly worse. Overall, the best four algorithms have accuracy values in the range 90.7-98.0%. More in details, the best performance was shown by a Single Layer Neural Network using PWI and DWI features [93], followed by SVM with volumetric, texture, shape features and multimodal imaging [64], a decision tree-based method [71], and a DL method, based on a 3D-CNN [75].
It is also worth noting that state-of-the-art methods and algorithms may not be those used in international competitions (e.g., MICCAI BraTS) which may have intrinsic limitations. For instance, BraTS is an international challenge focused on tumor segmentation and not OS classification (which was only a subchallenge); therefore the selected features were not necessarily optimised for OS. Most of the methods presented in this context, either based on a typical ML or DL architecture, extract and use significant features to achieve the best possible segmentation (the primary task) and often employ these features also for the secondary task (OS prediction). Hence, OS prediction is performed on the features that were chosen to obtain the best possible segmentation, without building a model focused on OS prediction itself. Therefore, those non-optimised models may obtain worse performance than state-of-the-art methods exclusively focused on survival prediction.

Conclusions
In conclusion, depending on the specific task, different algorithms perform differently. In particular, ML methods, integrated with additional information, including clinical, radiomic, semantic and DWI/PWI information showed the best performances for OS prediction. Without the use of this additional information, DL methods would have performed better.
Future studies should focus on developing ML/DL models by combining different data sources (i.e., clinical, radiomic, semantic and PWI/DWI), which are correlated and may provide complementary information [95] for improving the clinical decision-making tasks [96]. Moreover, given the proven importance of quantitative techniques such as PWI and DWI, future ML/DL models should leverage the wealth of data provided by novel and more refined diffusion and perfusion techniques [97][98][99][100][101], potentially also including information upon cerebral metabolism [99,102]. By integrating all these data into a single multimodal model, further improvements in performance could be achieved. Lastly, the research community should also plan to evaluate the impact of these integrative DL-based models, and compare their performance against analogous ML-based models.
Author Contributions: Conceptualization, all authors; Methodology, C.d.N., L.R. and F.Z.; writingoriginal draft preparation, C.d.N., L.R. and F.Z.; writing-review and editing, all authors; supervision, L.R. and F.Z. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.