A New Era of Neuro-Oncology Research Pioneered by Multi-Omics Analysis and Machine Learning

Although the incidence of central nervous system (CNS) cancers is not high, it significantly reduces a patient’s quality of life and results in high mortality rates. A low incidence also means a low number of cases, which in turn means a low amount of information. To compensate, researchers have tried to increase the amount of information available from a single test using high-throughput technologies. This approach, referred to as single-omics analysis, has only been partially successful as one type of data may not be able to appropriately describe all the characteristics of a tumor. It is presently unclear what type of data can describe a particular clinical situation. One way to solve this problem is to use multi-omics data. When using many types of data, a selected data type or a combination of them may effectively resolve a clinical question. Hence, we conducted a comprehensive survey of papers in the field of neuro-oncology that used multi-omics data for analysis and found that most of the papers utilized machine learning techniques. This fact shows that it is useful to utilize machine learning techniques in multi-omics analysis. In this review, we discuss the current status of multi-omics analysis in the field of neuro-oncology and the importance of using machine learning techniques.


Introduction
The global incidence rate of brain and nervous system cancers is 4.63 per 100,000 personyears, and they account for 2% of all cancers [1]. Furthermore, it is the most common cause of death in childhood (between 0 and 19 years). Glioblastoma multiforme (GBM), the most malignant primary brain tumor (glioma) according to the World Health Organization (WHO), has the worst prognosis with only 6.8% patients with a 5-year survival rate [2]. Therefore, it is imperative to understand neuro-oncology more deeply and develop effective treatment; paradoxically, this is hindered by the relatively small number of patients [3] and the consecutive little information that can be obtained. One possible solution is input by the researcher as every data type is produced by observing the same condition from a different perspective.
In the first part of this review, we review a brief history of multi-omics analysis in the field of neuro-oncology over the years; in the second, we highlight the current knowledge on multi-omics analysis; and in the third, we discuss the future direction of research.

Beginning Of Neuro-Oncology Research and Treatment
The modern accounts of neurosurgery and neuro-oncology begin with Harvey Cushing in the early 1930s [33,34]. In its infancy, surgical resection was the only treatment for brain tumors until a landmark prospective randomized-controlled study on 1,3-bis(2chloroethyl)-1-nitrosourea (BCNU) and/or radiation therapy was carried out in patients with anaplastic glioma in 1978 [35]. The study's protocol was very innovative as it was the first trial to determine whether a combination of treatment methods in anaplastic glioma was effective or not. The integration of treatment methods, surgical resection, chemotherapy and radiation therapy were completed for the moment in 2005, by "radiotherapy plus concomitant and adjuvant temozolomide (TMZ) for glioblastoma" [36].

Genotyping
Previously, brain tumors could only be diagnosed by pathological analysis [37]. In 1998, Cairncross et al. discovered that the combined allelic loss of the chromosomes 1p and 19q was associated with both chemosensitivity and longer recurrence-free survival after chemotherapy in patients with anaplastic oligodendrogliomas [38]. This was a monumental time in neuro-oncology research when genetic mutations and treatment responses were linked for the first time. After this, many researchers began to enthusiastically search for clinically meaningful gene alterations. The second breakthrough occurred in 2005. In addition to gene mutation, epigenomic silencing of the O 6 -methylguanine-DNA methyltransferase (MGMT) DNA-repair gene was related to good TMZ response in GBM patients [39]. In 2009, the third major discovery in the field occurred, that of isocitrate dehydrogenase 1 gene (IDH1) and IDH2 mutations in glioma patients [40], which showed that a singlepoint mutation results in the enzymatic activity of the encoded protein and affects tumor malignancy.

Beginning of Multi-Omics Analysis
As mentioned above, the history of treatment in the neuro-oncology field ( Figure 1) begins with the validation of the effectiveness of various treatment methods; the integration of these methods is presently underway. The 2016 World Health Organization Classification of the Central Nervous System (2016 CNS WHO) used molecular parameters in addition to histology to define tumor entities probably for the first time [41], i.e., the diagnosis made by the 2016 CNS WHO was an "integrated" diagnosis based on the phenotypic and genotypic classification of the tumor. The paper that can be considered the beginning of this "integration" trend was published in 2008 [42] and reported the interim results of the Cancer Genome Atlas (TCGA) pilot project ( Figure 2). It provided a network view of the pathways that were altered in the development of GBM by using DNA copy number and gene expression. This result was achieved by mapping the unequivocal genetic alternations onto major pathways that have been implicated in GBM. Although this paper did not use machine learning methods, there is no doubt that it was a landmark study in multi-omics analysis in neuro-oncology. Also, as described later, it is worth mentioning that most multi-omics studies in neuro-oncology field use the TCGA dataset.
This paper was the starting point for many multi-omics analysis studies in neurooncology. The details of these studies are described in the next section. This paper was the starting point for many multi-omics analysis studies in neurooncology. The details of these studies are described in the next section.

Search Strategy
We retrieved publications by searching the PubMed database for glioma OR glioblastoma OR medulloblastoma OR meningioma OR schwannoma AND multi omics* OR multiomic* (* means wild-card).

Eligibility Criteria
We selected relevant studies by screening their titles and abstracts, and then reviewed the full texts. We selected papers according to the following criteria.

1.
Studies that were not review articles.

2.
Studies that were focused on or related to neuro-oncology.

3.
Studies that used multi-omics data.

Categories of Papers
We categorized the selected papers according to three main outputs. (i) Pathways and networks-papers that aimed at discovering pathways or networks that were upregulated or downregulated in a tumor situation and included a study point to detect biomarkers; (ii) clinical status-the representative clinical status was prognosis; and (iii) miscellaneouspapers that did not fit into either category were defined as miscellaneous.

Overall Result
Based on the abovementioned criteria, we selected 23 papers ( Table 1). The pathways and networks category contained 12 papers, the largest number of papers, the clinical status category contained 7 papers, and 4 papers were categorized as miscellaneous. In terms of the dataset employed, 18 cases, i.e., two-thirds of the total papers, used the TCGA dataset (Supplementary Table S1). The most commonly used input data type was gene expression, which was used in 20 studies. As shown in Supplementary Table S2, the input data styles of copy number change profiles, somatic mutation, and DNA methylation were followed in 13, 12, and 9 studies, respectively. Metabolic profiling, histopathological images, mRNA expression, magnetic resonance imaging (MRI), clinical data, and whole exome sequencing (WES) were used in only one study each. Importantly, most of the papers utilized machine learning techniques to perform regression, classification, clustering, and dimensionality reduction. This fact shows the effectiveness of machine learning techniques in the multimodal analysis of multilayered omics data.

Pathways and Networks
As the paper that was the starting point of multi-omics analysis in neuro-oncology suggested the involvement of new pathways and networks in the disease, it makes sense that the pathway and network analysis category garnered the largest number of papers [42].
Lock et al., using the matrix decompression technique, developed a data decomposing method called joint and individual variation explained (JIVE). Using JIVE, the data were separated into a sum of three terms: a low-rank approximation capturing joint structure between data types, low-rank approximations capturing structure individual to each data type, and residual noise. These predicted the network of gene-miRNA interactions using the loadings of joint components [43].
Graph theory is often used to discover meaningful pathways and networks [45,48,52] and originated from tools used for analyzing topological problems. In 1736, Swiss mathematician Leonhard Euler introduced the basic idea of graphs, known as the Seven Bridges of Königsberg. Graph theory has been applied in various fields, such as social and information systems, physics, chemistry, and biology as it is useful for representing relationships. A graph consists of nodes (also called vertices or points) and edges (also called links or lines) that connect nodes. When graph theory is applied to biological fields, proteins or genes are often nodes. Zhang et al. proposed a new network analysis method called the prior information-dependent differential network analysis (pDNA), which was based on differential network analysis [48]. The analysis takes into account the following information: (i) a differential edge less likely to exist between two genes that do not participate together in the same pathway; (ii) changes in the networks driven by certain regulator genes that are perturbed across different cellular states; and (iii) the differential networks estimated from multi-view gene expression data that likely share common structures. Zhang et al. applied pDNA by using TCGA (gene expression and copy number change profiles) data to identify the differential networks between the proneural and mesenchymal subtypes of GBM. The results show that four genes were considered as a hub, large degree nodes. PDGFRA and CDK4, which are often amplified in proneural-type GBM, were included in the four genes. The other example was shown by Shafi et al. (Figure 3). They created a subnetwork that consisted of methylation-driven genes, differentially expressed genes, and known interaction (protein-protein interactions) using a network propagation algorithm [52]. There are several studies that have used original datasets, not TCGA, to provide important results [50,51]. Li et al. identified protein and metabolic markers that correlate to TMZ and discovered a protein-metabolic regulatory network using a mouse GBM model by integrating proteomics and metabolomics. Multi-omics analysis tends to focus on complex mathematical models, but it is important to combine the results of biological experiments and conduct new ones if needed, with mathematical models, as in this study.
Biomolecules 2021, 11, x FOR PEER REVIEW 10 of 17 Figure 3. Creation of subnetworks consisting of methylation-driven genes, differentially expressed genes, and known interactions using a network propagation algorithm (modified from Reference [48]). As shown here, some pathway and network studies attempt to discover network genes as nodes and connections between genes as edges. Shafi et al. detected differential expressed genes and methylated genes using the leave-one-out method. Then, they combined the result of the Figure.

Clinical Status.
Clinical status is often the output of multi-omics analysis. Among clinical status, a prognosis is the most frequent output of multi-omics analysis [44,[55][56][57][58][59]. The input data are also mostly derived from TCGA. Two unique studies are presented here (Figure 4). First, Zhang et al. performed an integrated analysis of histopathological images by combining multi-omics data (gene expression, copy number, and mRNA expression data) and clinical data. [57]. They handled histopathological images as data, not as a picture, using the open-source software CellProfiler. By doing so, histopathological images could be used as the input data for the machine learning model, similar to that from multi-omics data. Interestingly, they described that combining multi-omics features with histopathological features could predict prognosis more accurately than by using only histopathological features. Chaddad et al. reported a study based on a similar perspective [58]. They used features from MRI, instead of histopathological images, as input data in a machine learning model. They reported that the combination using features from MRI and multiomics data (genomics, transcriptomics, and proteomics/IHC) marked the maximal area under the curve.
Xiong et al. demonstrated that the average tumor purity that was calculated using multi-omics data by multiple methods correlated with prognosis [59]. This study differs from others in that it indirectly predicts prognosis. Kamoun Figure 3. Creation of subnetworks consisting of methylation-driven genes, differentially expressed genes, and known interactions using a network propagation algorithm (modified from Reference [48]). As shown here, some pathway and network studies attempt to discover network genes as nodes and connections between genes as edges. Shafi et al. detected differential expressed genes and methylated genes using the leave-one-out method. Then, they combined the result of the Figure.

Clinical Status
Clinical status is often the output of multi-omics analysis. Among clinical status, a prognosis is the most frequent output of multi-omics analysis [44,[55][56][57][58][59]. The input data are also mostly derived from TCGA. Two unique studies are presented here (Figure 4). First, Zhang et al. performed an integrated analysis of histopathological images by combining multi-omics data (gene expression, copy number, and mRNA expression data) and clinical data [57]. They handled histopathological images as data, not as a picture, using the open-source software CellProfiler. By doing so, histopathological images could be used as the input data for the machine learning model, similar to that from multi-omics data. Interestingly, they described that combining multi-omics features with histopathological features could predict prognosis more accurately than by using only histopathological features. Chaddad et al. reported a study based on a similar perspective [58]. They used features from MRI, instead of histopathological images, as input data in a machine learning model. They reported that the combination using features from MRI and multi-omics data (genomics, transcriptomics, and proteomics/IHC) marked the maximal area under the curve.
Xiong et al. demonstrated that the average tumor purity that was calculated using multi-omics data by multiple methods correlated with prognosis [59]. This study differs from others in that it indirectly predicts prognosis. Kamoun et al. focused on oligodendroglial tumors [54], not GBM, using an original dataset-the Prise en charge des oligodendrogliomes anaplasiques (POLA) cohort. First, they proved the validity of their integrative clustering techniques, named the cluster of clusters, by showing a strong correlation between the classification result based on their techniques and 1p/19q co-deletion and IDH mutation status. Next, they showed three subgroups within 1p/19q co-deleted tumors, that were associated with the specific expression patterns of nervous cell types: oligodendrocyte, oligodendrocyte precursor cell (OPC), and neuronal lineage. Last, they reported that the OPC-like group is associated with more aggressive clinical and molecular patterns, including MYC genomic gain, MAX genomic loss, MYC hypomethylation, and microRNA-34b/c downregulation. tumors, that were associated with the specific expression patterns of nervous cell types: oligodendrocyte, oligodendrocyte precursor cell (OPC), and neuronal lineage. Last, they reported that the OPC-like group is associated with more aggressive clinical and molecular patterns, including MYC genomic gain, MAX genomic loss, MYC hypomethylation, and microRNA-34b/c downregulation.

Miscellaneous
The studies categorized in miscellaneous are unique and interesting [61][62][63][64]. A method has been established to create cancer cell lines and animal models from GBM surgical specimens [65,66]. Rosenberg et al. measured and compared the molecular profiles of a set of parental tumors and paired GBM patient-derived cell lines (GBM-PDCLs) by using multi-omics analysis [61]. From their report, overall, the molecular profiles of GBM-PDCLs and paired-parental tumors resemble each other; however, some driver aberrations are lost or gained in the passage from tumor to GBM-PDCLs.
Cancer is a diverse disease, and two people with the same cancer often respond differently to the same stimuli. To answer this question, Bouhaddou et al. tried to build a mechanistic mathematical model that describes the interactions between commonly mutated pan-cancer signaling pathways [62]. They arranged the model to obtain multiomics data from the MCF10A cell line, a non-transformed mammalian cell line, and trained the model using existing reports and new experimental results to refine biochemical parameters and phenotypic predictions. They reported that their tailored model for glioma could predict an increase in the sensitivity of glioma cell line death to AKT inhibition.
In recent years, neoantigens have received a lot of attention due to their possible role in prognosis and immune therapeutic effect. Nejo et al. evaluated neoantigen expression between primary and recurrent paired 25 glioma samples by using multi-omics data [63].
A study aimed to identify glioma candidate biomarkers using multi-omics analysis was conducted by Liu et al. [64]. The study was characterized by the sheer volume of data and five public datasets. First, these scientists searched for brain-specific biomarkers. Then, they narrowed their search down to those detectable in the cerebrospinal fluid and, finally, they further narrowed their search down to the biomarkers specific to glioma. As a result, they reported that Protein kinase C Gamma (PRKCG) has great potential as a glioma-specific biomarker.

Discussion and Future Directions
In this review article, we describe the multi-omics analysis in the field of neurooncology, focusing on the main output and input data. Although the research on neurooncology is increasing, only 23 papers were eligible for our review criteria. The diversity of research is not high and the field is still in its infancy. However, we believe multiomics analysis will play a central role in precision medicine era for the following reasons. The most important and innovative point of multi-omics analysis is its ability to handle different types of information as parallel and integrate it for human use. In fact, multi-omics analysis is something that human doctors or researchers have been doing unconsciously. Assume that a human doctor predicts the prognosis of a cancer patient. In this case, a skilled doctor would consider not only tumor type and driver gene mutation but also Karnofsky Performance Status, the patient's medical image, and blood tests, as well as sex, age and familial background. This is because he knows empirically that he can predict more accurately if he takes all of these into account. However, this human-dependent multi-omics analysis has a limitation in that there is no reproducibility and explicability.
Nonetheless, as we have reviewed above, the combination of multi-omics analysis and machine learning could solve this problem. Importantly, judging from the fact that machine learning techniques were utilized in most of the papers presented in this review paper, it can be concluded that machine learning is a useful technique in multi-omics analysis. For this reason, we believe that the following properties of machine learning techniques, which we have previously introduced [67], are important.

1.
Multimodal learning: Different types of medical data (genomic, epigenomic data, etc.) can be integrated and treated as input. 2.
Multi-task learning: Multiple different tasks can be learned simultaneously by sharing part of the model.

3.
Representation learning and semi-supervised learning: Acquiring a representation of the data from a large amount of unlabeled data, which can then be learned from a small amount of labeled data.

4.
Automatic acquisition of hierarchical features: Higher-order correlations of inputs can be captured.
What is also expected to become important in the future is research that deals with and integrates information that has not historically been considered analytical data, such as radiological and histopathological images [57,58]. In this review, we have focused on papers on multimodal analysis of omics information such as multilayered genomic information. Recently, radiomics and radiogenomics, which integrate radiological images with clinical information and omics data, have been attracting attention, and successful examples have been published in the field of neuro-oncology [68,69]. There are also some excellent reviews published in the field of radiomics and radiogenomics, which you may be interested in reading [70][71][72].
One of the most important findings from our survey of various papers is the attempt to reproduce human-dependent multi-omics analysis with machine learning models. All data associated with disease are obtained by observing the disease from different angles. Fragments of the disease have been integrated by humans so far, but they are expected to be integrated by machine learning models in the future ( Figure 5). Multi-omics analysis will allow us to understand the nature of the disease more deeply and has the potential to change all medical fields such as drug discovery and therapeutic effects, predict prognosis, and discover the best treatment for each patient [9,11,14,[29][30][31][32]67,[73][74][75][76][77][78][79][80][81][82]. In addition, as the flow of information is bidirectional, it may be possible to reduce the noise of individual data with integrated information. It has been suggested that integrated data may be more robust than individual data. Thus, multi-omics analysis and machine-learning techniques are just beginning to open the door to a new era in neuro-oncology.
mation. Recently, radiomics and radiogenomics, which integrate radiological images with clinical information and omics data, have been attracting attention, and successful examples have been published in the field of neuro-oncology [68,69]. There are also some excellent reviews published in the field of radiomics and radiogenomics, which you may be interested in reading [70][71][72].
One of the most important findings from our survey of various papers is the attempt to reproduce human-dependent multi-omics analysis with machine learning models. All data associated with disease are obtained by observing the disease from different angles. Fragments of the disease have been integrated by humans so far, but they are expected to be integrated by machine learning models in the future ( Figure 5). Multi-omics analysis will allow us to understand the nature of the disease more deeply and has the potential to change all medical fields such as drug discovery and therapeutic effects, predict prognosis, and discover the best treatment for each patient [9,11,14,[29][30][31][32]67,[73][74][75][76][77][78][79][80][81][82]. In addition, as the flow of information is bidirectional, it may be possible to reduce the noise of individual data with integrated information. It has been suggested that integrated data may be more robust than individual data. Thus, multi-omics analysis and machine-learning techniques are just beginning to open the door to a new era in neuro-oncology. Figure 5. The concept art for future multi-omics analysis. Various types of data are obtained for garnering a better understanding of the nature of the disease and are integrated by machine learning models as well as humans.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Table S1: Studies reviewed in Table 1 grouped by input data category. Table S2: Studies reviewed in Table 1 grouped by input dataset. Figure 5. The concept art for future multi-omics analysis. Various types of data are obtained for garnering a better understanding of the nature of the disease and are integrated by machine learning models as well as humans.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/biom11040565/s1, Table S1: Studies reviewed in Table 1 grouped by input data category. Table S2: Studies reviewed in Table 1