Assessing Metabolic Markers in Glioblastoma Using Machine Learning: A Systematic Review

Glioblastoma (GBM) is a common and deadly brain tumor with late diagnoses and poor prognoses. Machine learning (ML) is an emerging tool that can create highly accurate diagnostic and prognostic prediction models. This paper aimed to systematically search the literature on ML for GBM metabolism and assess recent advancements. A literature search was performed using predetermined search terms. Articles describing the use of an ML algorithm for GBM metabolism were included. Ten studies met the inclusion criteria for analysis: diagnostic (n = 3, 30%), prognostic (n = 6, 60%), or both (n = 1, 10%). Most studies analyzed data from multiple databases, while 50% (n = 5) included additional original samples. At least 2536 data samples were run through an ML algorithm. Twenty-seven ML algorithms were recorded with a mean of 2.8 algorithms per study. Algorithms were supervised (n = 24, 89%), unsupervised (n = 3, 11%), continuous (n = 19, 70%), or categorical (n = 8, 30%). The mean reported accuracy and AUC of ROC were 95.63% and 0.779, respectively. One hundred six metabolic markers were identified, but only EMP3 was reported in multiple studies. Many studies have identified potential biomarkers for GBM diagnosis and prognostication. These algorithms show promise; however, a consensus on even a handful of biomarkers has not yet been made.


Introduction
Glioblastoma multiforme (GBM) is the most common primary malignant brain tumor in the United States, accounting for approximately 56.6% of all gliomas and 47.7% of all primary malignant central nervous system tumors [1]. GBM is 1.58 times more common in males than females, and the annual incidence of GBM is 2.53 per 100,000 population. The highest rate of diagnosis falls with the group aged 75 to 84 years; however, the median age of diagnosis is 65 years. Globally, the incidence is highest in North America, Northern and Western Europe, and Australia [2]. When accounting for race and ethnicity, incidence rates are highest among non-Hispanic whites and lowest among American Indians and Alaskan Natives. Furthermore, 1-and 5-year survival rates are lowest among non-Hispanic whites and highest among American Indians and Alaskan Natives [3].
The prognosis of GBM is notably grim, with a 1-year relative survival rate of 41.4% and a 5-year survival rate of 5.8% following diagnosis [1,3,4]. Negative prognostic factors include advanced age, incomplete resection, and poor mental performance status while the inverse of these factors each indicates a slightly better prognosis [4]. Furthermore, practice clinicians [25]. For diagnostic applications, a study from Zhou et al. incorporated liquid chromatography and mass spectrometry into an SVM-based ML algorithm to diagnose malignant brain gliomas (MBGs) by means of plasma lipid biomarker analysis. This method was shown to be a reliable noninvasive screening method for the diagnosis of MBGs [26]. The enhancement of diagnostic/prognostic methods integrated with ML algorithms allows physicians to assign patients more accurate prognoses for the expeditious implementation of treatment plans and conceivably better patient care as the technology continues to improve [4,10].
The heterogeneous nature of GBM along with high rates of re-incidence and therapeutic resistance necessitate the timely identification of novel therapeutic targets in the metabolism of GBM to remain ahead of this rapidly evolving disease [27]. Recent efforts to identify such targets have utilized tumor omics data integrated with clinical information by use of ML techniques [27]. However, there is still a paucity of literature concerning GBM metabolism and ML. To our knowledge, there has been no review on ML and GBM metabolism. Therefore, in this review, the authors systematically search the literature on ML and GBM metabolism and assess recent advancements with commentary on future developments in this novel and the emerging field of study.

Strategy and Registration
This study was performed in accordance with the Preferred Reporting Items for Systematic reviews and Meta-analyses (PRISMA) guidelines. This systematic review was registered on PROSPERO, with details of our initial protocol, and can be accessed at https://www.crd. york.ac.uk/prospero/display_record.php?ID=CRD42022367758 (accessed on 25 December 2022) [28].

Search and Data Sources
A literature search was performed using PubMed (Medline), Embase, Cochrane, OVID, and Web of Science databases from 1975 to October 2022. The following predetermined search terms were used: "metabolism" or "biomarkers" and "glioblastoma" and "artificial intelligence" or "machine learning" or "deep learning" or "predictive model" as title and abstract keywords (Supplementary Materials S1).

Selection Criteria
Articles obtained from searching the specified databases were imported into the Covidence platform (Veritas Health Innovation) for screening. The screening was independently performed by two investigators (Z.D.N. and C.B.) by title and abstract, and later by full-text review. Conflicts were resolved by consensus. When consensus could not be obtained, a third reviewer (N.P.) broke the tie. Articles describing the use of an ML algorithm for GBM metabolism were included. Additional inclusion criteria included research on human GBM metabolism and a confirmatory diagnosis of GBM for validity. Review articles, case reports, commentary, conference abstracts, unpublished articles, editorials, and purely technical descriptions were excluded. The language was restricted to English.

Data Extraction
Using a data extraction form, a quality and bias assessment was performed based on the quality assessment of diagnostic accuracy studies 2 (QUADAS-2) [29] for diagnostic studies and the quality assessment of prognostic accuracy studies (QUAPAS) [30] for prognostic studies. QUAPAS is adapted from QUADAS-2 and thus, the two can be compared together. Studies that featured both diagnostic and prognostic models were assessed with both assessments and combined. All studies were also assigned a level of evidence rating based on the American Association of Neurological Surgeons (AANS) and Congress of Neurological Surgeons (CNS) joint guidelines for diagnostic and prognostic studies organized into classes (I, II, and III) [31].
The following variables were extracted from the included studies: publication year, lead author, country of origin, study population, the origin of patient data (original or database), type of ML algorithm used, and source of biologic sample (plasma, tissue, etc). Our primary outcome was the accuracy of the ML algorithm. The secondary outcome, if reported, included the top metabolic markers identified by each study.

Search Results
Our initial search yielded 317 records with 235 articles remaining after the removal of duplicates ( Figure 1). These articles were screened by title and abstract, which returned 46 articles for a full-text review. Thirty-one articles were excluded, leaving 14 final studies included in this review, ten of which were included in the analysis [32][33][34][35][36][37][38][39][40][41]. Four studies were not included in the analysis because they did not perform an ML algorithm on GBM; however, they did discuss the topic [21,[42][43][44][45][46].
Metabolites 2022, 12, x FOR PEER REVIEW 4 of 20 pared together. Studies that featured both diagnostic and prognostic models were assessed with both assessments and combined. All studies were also assigned a level of evidence rating based on the American Association of Neurological Surgeons (AANS) and Congress of Neurological Surgeons (CNS) joint guidelines for diagnostic and prognostic studies organized into classes (I, II, and III) [31].
The following variables were extracted from the included studies: publication year, lead author, country of origin, study population, the origin of patient data (original or database), type of ML algorithm used, and source of biologic sample (plasma, tissue, etc). Our primary outcome was the accuracy of the ML algorithm. The secondary outcome, if reported, included the top metabolic markers identified by each study.

Search Results
Our initial search yielded 317 records with 235 articles remaining after the removal of duplicates (Figure 1). These articles were screened by title and abstract, which returned 46 articles for a full-text review. Thirty-one articles were excluded, leaving 14 final studies included in this review, ten of which were included in the analysis [32][33][34][35][36][37][38][39][40][41]. Four studies were not included in the analysis because they did not perform an ML algorithm on GBM; however, they did discuss the topic [21,[42][43][44][45][46]. Extraction results from each paper are tabulated below ( Table 1). All of the ten analyzed papers were published on or after 2018, except one (2012). The nationalities of the papers were as follows: one Canadian, four Chinese, and one Indian, Pakistani, Polish, Swiss, and American. Three papers (30%) were diagnostic, six were prognostic (60%), and one featured both (10%). Extraction results from each paper are tabulated below ( Table 1). All of the ten analyzed papers were published on or after 2018, except one (2012). The nationalities of the papers were as follows: one Canadian, four Chinese, and one Indian, Pakistani, Polish, Swiss, and American. Three papers (30%) were diagnostic, six were prognostic (60%), and one featured both (10%).

Quality and Bias and Level of Evidence
A quality and bias assessment was performed as described above with results tabulated in the Supplementary Materials (Table S1). Weakness in the included studies was most significant for the lack of data in the study design. Fifty-one percent (n = 24) of risk of bias questions were answered as "unsure" due to insufficient information ( Figure 2). Patient selection was also a substantial source of bias as most studies used some form of a national database with little information provided on the patient population or control population if used. Conversely, 78% (n = 29) of applicability concern questions were rated as "low" risk of bias ( Figure 3). Nine studies (90%) received a level of evidence rating of II, while one study (10%) received a rating of I, based on AANS and CNS joint guidelines for diagnostic and prognostic studies.  Patient selection was also a substantial source of bias as most studies used some form of a national database with little information provided on the patient population or control population if used. Conversely, 78% (n = 29) of applicability concern questions were rated as "low" risk of bias ( Figure 3). Nine studies (90%) received a level of evidence rating of II, while one study (10%) received a rating of I, based on AANS and CNS joint guidelines for diagnostic and prognostic studies.

Patient Samples and Databases
Samples were collected from patients as either tumor/healthy brain tissue (n = 7, 70%) or serum/plasma (n = 3, 30%). Most studies analyzed data from multiple national databases such as The Cancer Genome Atlas (TCGA) (National Institutes of Health) (n = 6, 60%), Chinese Glioma Genome Atlas (CGGA) (Beijing Neurosurgical Institute) (n = 2, 20%), Gene Expression Omnibus (GEO) (National Institutes of Health) (n = 3, 30%), as well as from original samples (n = 5, 50%). However, only three articles (30%) analyzed data from multiple sources. At least 2536 data samples were run through an ML algorithm; however, due to the usage of the same databases by multiple papers at various points of database completeness, an exact number of unique samples could not be determined.

Machine Learning and Accuracy
Twenty-seven ML algorithms were found in our analysis, 18 of which were unique (67%) (Figure 4). The least absolute selection and shrinkage operator (LASSO) and support vector machine (SVM) were the two most common methods and were featured in five and four studies, respectively. A mean of 2.7 ML methods was utilized in each study; however, only 50% (n = 5) of papers featured more than two methods of ML. Of the 27 ML methods used, 24 were supervised (89%) and three were unsupervised (11%), while 19 were continuous (70%) and eight were categorical (30%). A summary of each ML method is listed in (Table 2).     Table 2. A summary of each ML method is listed in these studies. This includes methods that were mentioned; however, some may not have been utilized, but are still included for educational purposes [32][33][34][35][36][37][38][39][40][41]. * Duplicate methods omitted.

Machine Learning Algorithm Definition
Linear Regression (ACE) Linear regression is a type of supervised ML algorithm used for predictive modeling. It is used to match observed data with a linear equation to model the correlation between the independent variables and the dependent variable [45]. ACE is a linear regression algorithm specifically designed for use with gene expression data.

Logistic Regression
Logistic regression is a statistical method that we use to construct a regression model when the response variable is in binary. It is integrated into a supervised machine learning algorithm to hypothesize an outcome along with a binary response (e.g., Yes/No, True/False) using a set of independent variables [46].

Random Forest *
Random forest is a supervised ML method that creates decision trees and combines them to improve the accuracy of the predictions. It uses a technique called bagging, where each tree is trained on a random grouping of the data [45].

Extra Tree Classifier
Extra tree classifier is a supervised ML algorithm that utilizes a decision tree-based ensemble method. It operates by constructing a set of decision trees and then training them with a random subset of the features. The final class prediction is created by combining all of the trees' individual class predictions. Extra tree uses more randomization when splitting nodes than is seen in a random forest algorithm [47].

Decision Tree
A decision tree is a type of supervised ML algorithm that is used for classification and regression. It makes predictions based on the feature values of input instances by constructing a tree-like model of decisions and their possible consequences [45].

SVM (Support Vector
Machines) * Support vector machines is a supervised ML algorithm that is used for classification and regression. It works by finding the best boundary (or "hyperplane") that separates the different classes [45].

ANN (Artificial Neural
Networks) ANN is a class of supervised ML algorithms that are modeled after human neuronal structure and can be applied to a variety of tasks, including the classification of images and the processing of natural language. They are made up of interconnected artificial neurons that can be trained to adjust the weights of connections between nodes. They can use a variety of architectures, including feedforward, convolutional, and recurrent neural networks [45].
• CNN is a type of neural network that is commonly utilized in the recognition of images and videos. It uses convolutional layers to learn spatial hierarchies of features automatically and adaptively from input data [48]. • Backpropagation is an ML algorithm for multilayer feedforward artificial neural networks (FFNN). Backpropagation is used to train these networks to produce a desired output for a given input [49]. • Dropout is a regularization technique for neural networks, which aims to reduce overfitting by randomly setting a portion of the neurons to zero during training. This helps to avoid overfitting by preventing the network from becoming too specialized for the training set [50]. • PASNet is a deep learning algorithm that combines feature selection and neural networks to predict disease-gene associations. It is used to identify the genes that are important for a specific disease, by incorporating information about biological pathways into the prediction process [39].

XGBoost (eXtreme Gradient Boosting)
XGBoost is a gradient-boosting supervised ML algorithm designed to be efficient and scalable. It is used for supervised ML problems, and it can be used for both classification and regression [51].

K-Means
K-Means is an unsupervised ML clustering algorithm that groups similar n-dimensional observations into k clusters, where k is predefined. The algorithm repetitively assigns points to the closest centroid and updates the centroid based on the mean of assigned points [52].

LASSO (Least Absolute Selection and Shrinkage
Operator) * LASSO is a supervised regularization method for linear regression models. LASSO's priority is to decrease the absolute values of the independent variable coefficients toward zero. It helps to prevent overfitting by reducing the model's complexity [53].
• LASSO-penalized Cox regression is a method that combines LASSO regularization with the Cox proportional hazards model. It aims to identify a subset of features that are most important for survival analysis while minimizing overfitting [54]. • Logistic LASSO is a regularization method for logistic regression. It is a combination of LASSO and Logistic regression and it aims to identify the subset of features that are most important for predicting binary outcomes while also minimizing overfitting [55]. • Random LASSO is a variant of LASSO that uses randomization to improve the feature selection process. It randomly assigns weights to the features before applying LASSO, which can help to reduce the variance of the feature selection results [56].

PCA (Principal Component Analysis) *
PCA is an unsupervised dimensionality reduction technique. The intention is to convert a group of correlated factors into a group of uncorrelated factors. It does this by switching the data to a new coordinate system. The axis then represents the direction of maximum variance in the data [57].

RSF-SRC (Random Survival Forest-Survival Regression and Classification)
RSF-SRC is a potentially unsupervised ML method for predicting the time-to-event (TTE) outcome in survival analysis (other variations may be supervised). It is an extension of the random forest algorithm, can handle censoring and truncation of time-to-event data, and can be used for both regression and classification [58].

PLS-DA (Partial Least-Squares Discriminant Analysis)
PLS-DA is a supervised ML algorithm that is used for classification. It works by finding a group of latent variables, which are linear combinations of the original variables, and that explain the differences between various different classes [59].

Naïve Bayes
Naïve Bayes is a supervised ML algorithm that is used for classification. It makes predictions based on the probability of certain features appearing in each class. It is called "naïve" because it assumes that all features are independent, which may not always be true [45].
Nine algorithms (33%) reported accuracy values and 18 (67%) reported area under the curve of the receiver operating characteristic (AUC of ROC) values, while only 6 (22%) reported neither. Due to the unverifiable nature of unsupervised ML, only accuracy or AUC of ROC values not reported for supervised ML methods were considered "missing." Only one paper met this criterion [33]. The mean reported accuracy was 95.63% [85.70%, 100.00%], while the mean AUC of ROC was 0.779 [0.590, 1.000].

Metabolic Markers
One hundred six metabolic markers were identified as the top predictive biomarkers from the analyzed studies. Of these, 23 (22%) were used for diagnosis and 83 (78%) were used for prognostication. Only one metabolic marker, EMP3, was reported in multiple studies; all other biomarkers were reported only once in their respective studies.

Discussion
Despite the innovations within the field of GBM research, prognoses remain poor. The studies within this review aim to improve diagnostic and prognostic accuracy by utilizing novel ML algorithms. Although this field exhibits an extensive level of research, there is a paucity of literature pertaining to the ML algorithms used to identify markers underlying GBM metabolism [42].

Supervised Machine Learning
Supervised ML is broadly used in a predictive scenario where a "ground truth" value can be determined (e.g., a diagnosis of GBM) and the user wishes to identify similar data sets with an unknown "ground truth." The supervised ML algorithms used by these studies were SVM, random forest, ANN, deep neural networks (e.g., PASnet), DT, NB, partial leastsquares discriminant analysis (PLS-DA), logistic regression models, and LASSO-penalized Cox regression analysis [32,[34][35][36][37][38][39][40][41][42][43]. A logistic regression model appears to outperform other ML algorithms in classification systems, in this case, the classification of the IDH mutation. The algorithms it outperformed were other supervised ML models such as SVM and random forest models. Specifically, the logistic regression model obtained greater results, which were determined by its performance in determining the AUC of ROC, Bal accuracy, F1 score, precision, recall, and Matthew's correlation coefficient (MCC) [43].
Supervised ML algorithms are powerful tools in the identification of GBM biomarkers. One study found that by extracting a small amount of peripheral blood (5 µL), a surfaceenhanced Raman scattering (SERS) signal-trained supervised ML algorithm was able to distinguish GBM cancer from noncancer without isolating cells. The PLS-DA algorithm exhibited both high sensitivity and specificity. A confirmation test with an ANN validated the previous outcome, and the ANN was crucial in determining the prognosis of the disease [32]. Congruently, Gollapalli et al. used a PLS-DA algorithm to distinguish between GBM patients and healthy controls using predetermined biomarker subsets to discern a high level of classification. Results from this study were confirmed with DT, SVM, and NB algorithms [41].

Unsupervised Machine Learning
Unsupervised ML techniques are generally used when a user wishes to understand and perhaps categorize their data, without knowing their primary data "ground truth." The unsupervised ML algorithms used by these studies were K-means and an integrated Kernel PCA. Unsupervised ML methods such as K-means have been used to create continuous clustering models. These models use metabolism-related genes to create stratified clusters with calculated similarity distances between GBM samples [37]. Furthermore, deep neural networks (e.g., PASnet) have been integrated into prediction models, along with kernel principal component analysis (KPCA), as methods to forecast prognostic survival analyses from high-throughput data [39]. The literature on deep learning networks in GBM metabolism is sparse, likely due to the complicated methodology involved in the construction of these algorithms. Sometimes, a combination of both unsupervised and supervised ML is useful. Riviere-Cazaux et al. contrasted the heterogeneity between GBM patients based on IDH status and patient identity. The team utilized PCA for an unbiased evaluation of patient groupings, followed by PLS-DA to identify the predictive variables between the groups. This is a quality example of how several variants of different algorithms can work together harmoniously to achieve results [44].

Metabolic Markers
A major prognostic metabolic marker researched throughout these studies is isocitrate dehydrogenase-1 (IDH1), which when correlated with various mutations of that marker, as well as when utilized for the characterization of patients, was found to be overexpressed in both high-and low-grade GBM patients [33,37,40]. The type of IDH1 (wild-type vs. mutation) was found to impact the degree of prognosis between different unsupervised clusters of patients, implicating it as a possible prognostic marker, although it should be mentioned that there was a statistically significant difference in age between these clusters [35]. Moreover, IDH status was a risk factor identified as one of the prognostic classifiers with a statistically significant high hazard ratio [40]. Additionally, several studies emphasized the importance of matching metabolic pathway markers with associated genetic alterations in a stepwise fashion to predict the prognosis of GBM patients with greater accuracy [37,39]. The emphasis of ML identification on IDH1 overexpression in the progression of GBM gives credence to an antimetabolic approach, as decreasing the activity of this pathway could impair the growth and development of GBM tumors [42]. However, IDH1 and its role in GBM metabolism has been heavily researched in the literature and is not a new discovery based on these ML papers currently being discussed. Rather, many of these papers used IDH1-positive samples as a starting point for further analysis with ML algorithms.
Various alternate metabolic markers of importance are the levels of dysregulated amino acids. These amino acids have been identified as a product of activated or deactivated metabolic pathways in GBM to increase nutrient availability for tumors [34]. Many of those amino acids were discovered in previous experiments and then analyzed by ML in these studies to differentiate patients' glioma grading, thereby ascertaining a method for a more precise diagnosis [34,42]. In fact, Firdous et al. found that their diagnostic study utilizing an extra tree classifier, logistic regression integration, and random forest algorithms had greater predictive accuracy than any other previous studies of ML algorithms on the identification of metabolic markers in tissue or liquid-based biopsies [34].
A study conducted by Zeng et al. found that UDP glucose phosphorylase-2 (UGP2) was an upregulated enzyme that exhibited a significant effect on the prognosis of GBM. They identified this marker using a random survival forest algorithm, which is a type of supervised ML method. The overexpression of UGP2 was correlated with a worse prognosis and a higher grade of pathology. As a result, UGP2 may be a useful prognostic marker for GBM patients [38].
In addition, Kałuzińska et al., utilized multiple SVM algorithms to classify the top genes present in multiple types of cancers, including GBM. The team concluded that WWOX-dependent biomarkers PLEK2 and GCSH are possible GBM biomarkers and should serve as a triad along with RRM2. Further investigation is needed pertaining to PLEK2 and GCSH to analyze their prognostic accuracy and ability to differentiate between GBM versus alternative gliomas [36].
Lastly, several independent studies have identified EMP3 as a prognostic gene for highgrade gliomas [35,40]. It has been shown to function as a reliable indicator for prognosis at the mRNA level [40]. In fact, EMP3 was the only gene identified in more than one study. As such, further research involving this genetic marker has the potential to improve the prognostic process for patients diagnosed with glioblastoma.

Future Directions
Machine learning has the ability to greatly improve the prognostic and diagnostic capabilities of GBM. However, an integration of ML algorithms for biomarker detection combined with radiomics-based tumor imaging will be necessary to ascertain the greatest level of accuracy and precision [21]. By analyzing the characteristics of the tumor such as shape, size, and texture, radiomics can provide valuable information on the tumor's current state and progression. Combining the two ML algorithms to analyze the quantitative data from both imaging and biomarkers could improve disease outcomes, once perfected, at a rate higher than any one method alone.
Overall, our findings highlight the importance of further research in this evolving field in order to fully grasp the potential of ML in the diagnosis and prognosis of GBM. Advancements in this area may significantly enhance patient care and treatment outcomes for individuals affected by this devastating disease in the future.

Conclusions
Machine learning is a cutting-edge technology that analyzes data and makes predictions or decisions using algorithms and statistical models. It is a formidable research tool and has the potential to completely change how complex diseases such as glioblastoma are studied and understood. The goal of machine learning is to recognize and categorize unknown data samples using training data. The studies we reviewed have found novel insights into the mechanisms of GBM and identified potential biomarkers for diagnosis and prognostication by utilizing this technology in the study of GBM metabolism.
Arguably one of ML's most significant advantages is its ability to adapt and improve over time as it processes more data, making it ideal for dealing with complex and dynamic tasks. This is particularly useful for assignments that traditional rule-based systems are unable to manage. Additionally, machine learning can automate tasks that would normally require human intervention, increasing efficiency, and decreasing error rates. This can lead to cost savings, increased productivity, and more accurate decision making.
Conversely, machine learning has drawbacks that must be considered despite its benefits. One significant shortcoming is the requirement for large quantities of high-quality training data, which can be expensive and challenging to come by. Furthermore, it can be challenging to understand how ML models make decisions and how to optimize them because the results can be ambiguous and difficult to interpret. It is also important to remember that the effectiveness of ML models depends considerably on the caliber of the data, as well as the specific task it is assigned. Therefore, it is essential to take these factors into account when implementing ML into medical research.
GBM is a complicated disease with a limited understanding of the underlying biological mechanisms, making diagnosis and treatment challenging. The use of ML algorithms has demonstrated incredible promise in the enhancement of diagnostic and prognostic capabilities for GBM patients; however, a consensus on even a handful of biomarkers discovered with ML algorithms has not yet been made. Many researchers are still exploring this new field and there is still much to be learned. Despite the challenges and limitations, the potential of ML in the study of GBM metabolism is clear.