Identifying Tumor-Associated Genes from Bilayer Networks of DNA Methylation Sites and RNAs

Network theory has attracted much attention from the biological community because of its high efficacy in identifying tumor-associated genes. However, most researchers have focused on single networks of single omics, which have less predictive power. With the available multiomics data, multilayer networks can now be used in molecular research. In this study, we achieved this with the construction of a bilayer network of DNA methylation sites and RNAs. We applied the network model to five types of tumor data to identify key genes associated with tumors. Compared with the single network, the proposed bilayer network resulted in more tumor-associated DNA methylation sites and genes, which we verified with prognostic and KEGG enrichment analyses.


Introduction
Biological processes include complex molecular interactions that can be efficiently characterized by network models [1]. Since the early 2000s, much attention has been paid to the structure and function of molecular interaction networks. Metabolic, protein, and gene coexpression networks possess small-world and scale-free characteristics [2][3][4][5][6]. Additionally, the robustness of the molecular interaction networks can help us understand the mechanisms behind complex diseases [7,8]. In particular, disease-associated genes need to be identified from the network perspective [9][10][11].
Although experimental methods can more accurately identify disease-associated genes, they are costly and time consuming [12]. Therefore, many computational approaches have been proposed to identify disease-associated genes, for example, through gene expression analysis or some machine learning methods [13,14]. However, these gene expression analyses are subjected to many uncertainties, and therefore they are not very accurate [15], but machine learning methods often lead to overfitting and the results cannot be adequately explained [16]. Because of their accuracy and interpretability, molecular interaction networks have attracted considerable attention from the biological community. Specifically constructing a proper network model and using it to identify disease-associated genes are fundamental problems. In the literature, many approaches have been developed through nodal centrality. For instance, the tumor-associated genes in gene coexpression networks have more edges than nonassociated genes, requiring the adoption of the degree centrality to identify these genes [17]. Additionally, the betweenness centrality, defined on the basis of shortest paths, has been adopted in protein-protein interaction networks to identify tumor-associated proteins [18]. On the basis of these two measures, many variants have been developed, such as the PageRank centrality [19] and the HITS algorithm [20]. The different centrality indicators enable the identification of hub nodes in the network from different perspectives [21].

Data Collection and Preprocessing
We obtained DNA methylation datasets (Illumina Human Methylation 450K, level 3), gene expression datasets (IlluminaHiSeq_RNASeqV2, level 3), and clinical datasets for lung squamous cell carcinoma (LUSC), breast cancer (BRCA), endometrioid cancer (UCEC), kidney cancer (KIRC), and bladder cancer (BLCA) from the TCGA database [31]. The β values were derived at the Johns Hopkins University and University of Southern California TCGA genome characterization center, which are continuous variables between 0 and 1, representing the ratio of the intensity of the methylated bead type to the combined locus intensity [32]; we removed the probes with β values of "NA" and those not in the gene regions considered in our analysis. Clinical data included time of death and death status of patients (Table S1).
We obtained the RNA binding relationships from the RNAInter database [33], which is a comprehensive RNA interactome resource. The database scores the confidence level of each interaction relationship by combining literature support and experimental validation, where a score above 0.75 indicates the existence of strong experimental evidence for the interaction relationship [34].

Network Construction
Currently, DNA methylation networks are generally constructed on the basis of the correlation coefficients between sites [35]. However, DNA methylation sites mainly regulate gene expression by recruiting proteins involved in gene expression or by inhibiting the binding of transcription factors to DNA, with no direct relationship between the sites [36]. Therefore, networks based on Pearson correlation coefficients between the sites are inaccurate. The methylation level of a site affects the expression of the corresponding genes [37], which motivated us to construct a bilayer network of DNA methylation sites and RNAs.

Construction of the DNA Methylation and RNA Interaction Networks
We determined the differences in the DNA methylation sites between tumor and paraneoplastic tissue using empirical Bayes' moderated t-test method, contained in the limma package [38] in R (version 4.0.3, Guido Masarotto, AT, 2020). To reduce the risk of false positives, we adjusted p-values with the Benjamini-Hochberg false discovery rate (FDR) method. We used an FDR < 0.01 as the significance threshold [39]. We then joined the edges using the cutoff of the Pearson's correlation coefficients between sites. The Pearson correlation coefficient [40] between the different DNA methylated sites was defined as follows: where n is the number of samples in the tumor dataset, X i is the level of the DNA methylation of the i th sample, and X is the average level of the DNA methylation at the site. We calculated all r values with the C language program. The screening of edges in correlation networks is mainly based on the hypothesis test p-value or the cutoff of correlation coefficients. Because the p-value is easily affected by the sample size [41], we adopted a cutoff of the correlation coefficients to construct the DNA methylation network. Here, two DNA methylation sites were linked if the correlation coefficient |r| > 0.8 [42]. RNAs can regulate each other through binding relationships [43], which mainly depend on the structure of the RNA and the base sequence [44] and are relatively stable [45]. For the RNA layer, we therefore used the binding relationship between RNAs. For the edges of the RNA layer, we used the RNA interactions with confidence scores >0.75 from the RNAInter database, which are supported by strong experimental evidence [34].

Construction of Bilayer Network of DNA Methylation Sites-RNAs
The links between the DNA methylation layer and the RNA layer are the DNA methylation sites connected with the corresponding genes. Using the edge information in the RNA layer, we filtered the edges in the DNA methylation layer. Specifically, we required two methylation sites to satisfy one of the following conditions: (i) being located on the same gene; (ii) the corresponding genes connected at the RNA layer; and (iii) the corresponding genes did not have an edge in the RNA layer, but they connected to an intermediate gene. Edges between DNA methylation sites that did not meet one of the above conditions were removed, even though they were strongly correlated with each other. The above criteria were determined using the networkX package in python (version 3.8.1).

Degree Centrality (DC)
Degree centrality [46] represents the number of connected edges that a node has with other nodes in the network. The degree centrality of node v i is defined as Life 2023, 13, 76 4 of 17 where N is the total number of nodes in the network, and A represents the adjacency matrix of the network. The larger the degree of centrality of a node, the more important it is the in the network [3] (Figure 1).

Betweenness Centrality (BC)
Betweenness centrality [47] is a measure of the participation of a node in the shortest paths in a network. The betweenness centrality of node v i is defined as where σ st is the number of shortest paths from s to t, and σ st (v i ) is the number of shortest paths from s to t that pass through node v i . In biological networks, nodes with high betweenness centrality generally play a key role in the connectivity of the network, such as communicating two modules and serving as a bridge to connect them [48] (Figure 1).

Degree Centrality (DC)
Degree centrality [46] represents the number of connected edges that a node has with other nodes in the network. The degree centrality of node is defined as where is the total number of nodes in the network, and represents the adjacency matrix of the network. The larger the degree of centrality of a node, the more important it is the in the network [3] (Figure 1).

Betweenness Centrality (BC)
Betweenness centrality [47] is a measure of the participation of a node in the shortest paths in a network. The betweenness centrality of node is defined as , where is the number of shortest paths from to , and is the number of shortest paths from to that pass through node . In biological networks, nodes with high betweenness centrality generally play a key role in the connectivity of the network, such as communicating two modules and serving as a bridge to connect them [48] (Figure 1). In this study, only the DNA methylation layer was different in the bilayer network for various tumors, and therefore we performed the centrality analysis only for the DNA methylation layer.

Average Degree
The average degree, usually denoted by , is the average of the degrees of all nodes [49]: The average degree can indicate how many neighbors the nodes have, on average, in the network.

ER Random Network
The ER random network model [49] is an equal-opportunity network model. In this model, given a certain number of nodes, a node has the same probability of inter-relationship (connection) with any other node, denoted as the edge probability of the network. In this study, only the DNA methylation layer was different in the bilayer network for various tumors, and therefore we performed the centrality analysis only for the DNA methylation layer.

Average Degree
The average degree, usually denoted by k , is the average of the degrees of all nodes [49]: The average degree can indicate how many neighbors the nodes have, on average, in the network.

ER Random Network
The ER random network model [49] is an equal-opportunity network model. In this model, given a certain number of nodes, a node has the same probability of interrelationship (connection) with any other node, denoted as the edge probability p of the network.

Clustering Coefficient
The clustering coefficient [49] is a coefficient used to describe the level to which nodes in a graph form clusters with each other. It considers not only the number of neighbors of a node, but also the relationships between neighboring nodes. For example, the number of neighbors of node i is k i . These neighboring nodes have at most k i (k i − 1)/2 edges between them. The clustering coefficient C i for node i is defined as the ratio of the number of edges E i formed between the neighboring nodes of that node and the maximum number of possible edges: The higher the clustering coefficient, the more compact the network. The average clustering coefficient C network is the average of the clustering coefficients of all nodes in the

Shortest Path Length
The shortest path length, d ij between nodes i and j is defined as the number of edges on the shortest path connecting these two nodes [49]. The average shortest path is defined as the average of the paths between any two nodes in the network, and it reflects the tightness of the network: The concept of the shortest path was used to find functional clusters in biological systems [51].
We calculated all the above metrics in a network using the networkX package in python (version 3.8.1).

Chi-Squared Test
The chi-squared test is used to test the level of deviation between the actual observed and theoretically inferred values of a sample [52]. The null hypothesis is that the observed frequencies do not differ from the expected frequencies, and the alternative hypothesis is that the observed frequencies differ from the expected frequencies. The chi-squared test statistic is defined as where f o and f e represent the observed and theoretical values, respectively. For the chisquared test of column independence, the degrees of freedom are d f = (R − 1)(C − 1), where R and C denote the number of rows and columns in the table, respectively. The chisquared test requires the degree of freedom to determine the significance level of the statistic. The chi-squared table or statistical software can be used calculate the corresponding p value according to the chi-squared value and the degree of freedom. We conducted the chisquared test in this study with the CHISQ.TEST function in Excel [53]. We considered p < 0.05 as statistically significant.

Log Rank Test
The log rank test is used to test the significant differences in the location of the distribution of the overall population in which the test data are located in the case of an arbitrary overall distribution [54]. By arranging the observations in ascending order, each observation is numbered in order, which is called the rank. The test is then performed by calculating the rank sum for each of the two groups of observations. The null hypothesis is that the overall distribution of the two groups is the same, and the alternative hypothesis is that the overall distribution of the two groups is different. The rank sum of the smallest group of sample size is used as the t-test statistic. In this study, we performed the log rank test using the lifelines package in Python (version 3.8.1). We considered p < 0.05 as statistically significant, indicating the distribution of the two groups was significantly different.

Identification of Differentially Expressed Genes (DEGs)
We determined the differentially expressed genes between the tumor and paraneoplastic tissues using the empirical Bayes' moderated t-test method, contained in the limma package [38] in R (version 4.0.3). We calculated log2 (fold change) using the average expression of the two groups of genes. The thresholds were FDR < 0.05 and | log2 (fold change) | > 1 [55].

Survival Analysis
We used survival analysis to examine the relationship between the DNA methylation sites and overall survival (OS). We divided patients into high-and low-risk groups on the basis of the mean β value of the site. We analyzed the difference between the two groups with KM analysis [56] on the basis of the lifelines package in Python (version 3.8.1). A log rank p < 0.05 was considered statistically significant.

KEGG Pathway Enrichment Analysis
We performed functional enrichment analysis for genes that we found only in the bilayer network. We used the Database for Annotation, Visualization, and Integrated Discovery (DAVID) [57] tool for the KEGG enrichment analysis based on the hypergeometric distribution to compute the p-values [58]. We set p < 0.05 as the threshold value.

Characteristics of DNA Methylation Sites-RNAs Bilayer Network
On the basis of the DNA methylation data from the TCGA database and the RNA interaction information from the RNAInter database, we obtained bilayer networks of the DNA methylation sites and RNAs ( Figure 2) for five types of tumors: LUSC, BRCA, UCEC, KIRC, and BLCA.
We determined the differentially expressed genes between the tumor and paraneoplastic tissues using the empirical Bayes' moderated t-test method, contained in the limma package [38] in R (version 4.0.3). We calculated log2 (fold change) using the average expression of the two groups of genes. The thresholds were FDR < 0.05 and | log2 (fold change) | > 1 [55].

Survival Analysis
We used survival analysis to examine the relationship between the DNA methylation sites and overall survival (OS). We divided patients into high-and low-risk groups on the basis of the mean β value of the site. We analyzed the difference between the two groups with KM analysis [56] on the basis of the lifelines package in Python (version 3.8.1). A log rank p < 0.05 was considered statistically significant.

KEGG Pathway Enrichment Analysis
We performed functional enrichment analysis for genes that we found only in the bilayer network. We used the Database for Annotation, Visualization, and Integrated Discovery (DAVID) [57] tool for the KEGG enrichment analysis based on the hypergeometric distribution to compute the p-values [58]. We set 0.05 as the threshold value.

Characteristics of DNA Methylation Sites-RNAs Bilayer Network
On the basis of the DNA methylation data from the TCGA database and the RNA interaction information from the RNAInter database, we obtained bilayer networks of the DNA methylation sites and RNAs ( Figure 2) for five types of tumors: LUSC, BRCA, UCEC, KIRC, and BLCA. Flow of DNA methylation-RNA bilayer network analysis. Nodes in DNA methylation layer are blue, representing DNA methylation sites; nodes in the RNA layer are red, representing RNAs. We constructed the DNA methylation network on the basis of the cut-off correlation coefficient and the RNA network on the basis of RNA binding. According to the relationship between the genes corresponding to DNA methylation sites in the RNA layer, we performed edge filtering between the DNA methylation sites. After the bilayer network was constructed, we analyzed the hub nodes according to centrality indices. Flow of DNA methylation-RNA bilayer network analysis. Nodes in DNA methylation layer are blue, representing DNA methylation sites; nodes in the RNA layer are red, representing RNAs. We constructed the DNA methylation network on the basis of the cut-off correlation coefficient and the RNA network on the basis of RNA binding. According to the relationship between the genes corresponding to DNA methylation sites in the RNA layer, we performed edge filtering between the DNA methylation sites. After the bilayer network was constructed, we analyzed the hub nodes according to centrality indices.
Although the average degree in the DNA methylation layer considerably varied (e.g., 19 for LUSC compared with 4 for UCEC), the degree distributions of the five tumors showed right-skewed behavior, implying a scale-free characteristic ( Figure 3). In this scenario, a few nodes in the network had a large number of edges, and thus they were identified as hubs. We also noticed the small-world property. As shown in Table 1, the average clustering coefficient was high, and the average shortest path length was low. We obtained all the results from the edge-filtered DNA methylation layer, which was much sparser than the single DNA methylation network directly generated from correlations. this scenario, a few nodes in the network had a large number of edges, and thus they were identified as hubs. We also noticed the small-world property. As shown in Table 1, the average clustering coefficient was high, and the average shortest path length was low. We obtained all the results from the edge-filtered DNA methylation layer, which was much sparser than the single DNA methylation network directly generated from correlations.  Next, we analyzed the structural properties of the RNA layer. Despite various tumors, the structure remained unchanged, containing 8087 RNAs and 20,128 RNA binding relationships. Most of nodes in the network were mRNAs, and most of edges were produced between noncoding RNAs and mRNAs, in agreement with the situation in reality. Moreover, the average numbers of edges connected to lncRNA, miRNA, and mRNA were 15.207, 12.941, and 2.883, respectively, implying that the noncoding RNAs were more central. Finally, we recovered the scale-free and small-world features of the RNA network.

Correlation of Hubs with Tumor Development Process
Hub nodes play an important role in biological processes [58]. To identify these nodes, various centrality metrics have been adopted in the study of biological networks [3,59], among which the degree centrality [46] and betweenness centrality [47] are commonly used because of their efficacy and interpretability. Next, we applied these two centralities to rank the nodes with importance in the DNA methylation layer, and we present the most important nodes in Table 2.
According to screening based on the degree centrality, the most critical node for LUSC was Cg25080152, corresponding to the gene MYC, which is a target gene for cancer therapy [60]. The most important node we identified in BRCA was Cg24771570, which corresponds to the gene GRB2. Most cancer malignancies are caused by abnormal signaling of the Grb2 adaptor molecule [61]. Cg14751398 was the largest hub node in the UCEC network, located on E2F3, which is linked to poor prognosis in some cancers as an oncogenic factor [62]. Cg08311343 was the most significant node in the KIRC network, which is located on the CDK6 gene. CDK6 is able to regulate the cell cycle, and its inhibitors have been used as effective therapeutic drugs for breast cancer [63]. In the BLCA network, the most critical DNA methylation site was Cg12931157, corresponding to the gene NFYA, which is associated with cell-cycle alterations and cell proliferation as a transcription factor and is closely related to several tumors [64].
Our ranking of nodes with importance based on the betweenness centrality revealed that the most important node for LUSC was Cg08133058, corresponding to the gene Life 2023, 13, 76 9 of 17 SASH1, which is a prognostic indicator and a potential therapeutic target in non-small-cell lung cancer [65]. We identified Cg26383454, located on the SMIM13 gene, as the most important DNA methylation site for BRCA, which is a membrane-associated protein.
The key node identified for UCEC was Cg18776056, located on the gene FKBP4, which is a progestin receptor cochaperone protein associated with cancer malignancy [66]. We identified Cg19858017 for KIRC, corresponding to the gene CLSTN1, which can be used as a biomarker for a variety of cancers [67,68]. Finally, the most critical node in the BLCA network was Cg01473187, corresponding to the gene TSPAN6, which is a suppressor of Ras-driven cancer [69]. In summary, all the genes corresponding to the hub nodes identified were closely associated with tumors according to the two centrality measures.
To illustrate the fact that the bilayer network could find more tumor-associated DNA methylation sites than the single DNA methylation correlation networks, we further compared the number of prognostically correlated loci among the hub DNA methylation sites found by the two approaches [34]. We calculated the number of survival-associated loci among the top 100-500 DNA methylation sites with the largest degree and betweenness centralities for the five tumor datasets (Figure 4, Table S2). In this scenario, the results of the chi-squared test for all the tumors showed that the betweenness centrality for the bilayer network was better than that of the single network because more prognostic-associated DNA methylation sites were identified. For the degree centrality, although the chi-squared test results showed that the bilayer network was better than the single network only for one cancer, the rest of the bilayer networks marginally outperformed the corresponding single network. This finding can be explained as a result of the filtering of edges between the DNA methylation sites through the information in the RNA layer, which enhanced the authenticity of the network. Thus, the betweenness centrality of the bilayer network was substantially improved compared with that of the single network. The degree centrality ranks the importance of a node by the number of its neighbors. In a biological network, the more a node interacts with other nodes, the more important the node. In contrast, the betweenness centrality assesses node importance by counting the number of times that it serves as the shortest path in the network. In a biological network, this shortest path is closely related to actual biological pathways. A node with a large betweenness centrality is located at the intersection of multiple critical pathways in the DNA methylation layer, and a contiguous edge in the methylation layer represents a pathway in the RNA layer where the node is also at the intersection of critical pathways, and therefore the identified node is more important to the network.
itors have been used as effective therapeutic drugs for breast cancer [63]. In the BLCA network, the most critical DNA methylation site was Cg12931157, corresponding to the gene NFYA, which is associated with cell-cycle alterations and cell proliferation as a transcription factor and is closely related to several tumors [64].
Our ranking of nodes with importance based on the betweenness centrality revealed that the most important node for LUSC was Cg08133058, corresponding to the gene SASH1, which is a prognostic indicator and a potential therapeutic target in non-small-cell lung cancer [65]. We identified Cg26383454, located on the SMIM13 gene, as the most important DNA methylation site for BRCA, which is a membrane-associated protein.
The key node identified for UCEC was Cg18776056, located on the gene FKBP4, which is a progestin receptor cochaperone protein associated with cancer malignancy [66]. We identified Cg19858017 for KIRC, corresponding to the gene CLSTN1, which can be used as a biomarker for a variety of cancers [67,68]. Finally, the most critical node in the BLCA network was Cg01473187, corresponding to the gene TSPAN6, which is a suppressor of Ras-driven cancer [69]. In summary, all the genes corresponding to the hub nodes identified were closely associated with tumors according to the two centrality measures.
To illustrate the fact that the bilayer network could find more tumor-associated DNA methylation sites than the single DNA methylation correlation networks, we further compared the number of prognostically correlated loci among the hub DNA methylation sites found by the two approaches [34]. We calculated the number of survival-associated loci among the top 100-500 DNA methylation sites with the largest degree and betweenness centralities for the five tumor datasets (Figure 4, Table S2). In this scenario, the results of the chi-squared test for all the tumors showed that the betweenness centrality for the bilayer network was better than that of the single network because more prognostic-associated DNA methylation sites were identified. For the degree centrality, although the chi-squared test results showed that the bilayer network was better than the single network only for one cancer, the rest of the bilayer networks marginally outperformed the corresponding single network. This finding can be explained as a result of the filtering of edges between the DNA methylation sites through the information in the RNA layer, which enhanced the authenticity of the network. Thus, the betweenness centrality of the bilayer network was substantially improved compared with that of the single network. The degree centrality ranks the importance of a node by the number of its neighbors. In a biological network, the more a node interacts with other nodes, the more important the node. In contrast, the betweenness centrality assesses node importance by counting the number of times that it serves as the shortest path in the network. In a biological network, this shortest path is closely related to actual biological pathways. A node with a large betweenness centrality is located at the intersection of multiple critical pathways in the DNA methylation layer, and a contiguous edge in the methylation layer represents a pathway in the RNA layer where the node is also at the intersection of critical pathways, and therefore the identified node is more important to the network.

Correlations between DNA Methylation Sites Located on the Same Gene
In general, involvement in the same biological process or similarity in gene function leads to gene coexpression. To verify this property for comethylation [70], we calculated the correlation coefficients between hub the DNA methylation sites and present the heat maps in Figure 5 and the subnet formed by these hubs in Figures S1-S10. Overall, we noticed a strong correlation between them, implying that the identified hubs from the network corresponding to genes are likely located within the same biological pathways or perform similar functions. Using prognostic analysis and subsequent pathway enrichment analysis, we found that many of the hub nodes are associated with cancer. For the DNA methylation sites on the same gene, the correlation between them tended to be stronger. For example, among the top 100 DNA methylation sites in the BRCA network, multiple sites, such as cg27588093, cg21160149, cg04988794, cg27523417, and cg17421241, are all located on the PRDM16 gene, and we found a strong Pearson correlation coefficient between them, | | 0.781 (the average correlation of the top For the DNA methylation sites on the same gene, the correlation between them tended to be stronger. For example, among the top 100 DNA methylation sites in the BRCA network, multiple sites, such as cg27588093, cg21160149, cg04988794, cg27523417, and cg17421241, are all located on the PRDM16 gene, and we found a strong Pearson correlation coefficient between them, | r | avg = 0.781 (the average correlation of the top 100 DNA methylation sites resulting from the betweenness centrality was 0.277). This gene is a protein-coding gene that encodes a zinc finger transcription factor that suppresses tumor production [71]. Among the top 100 DNA methylation sites in the BLCA network, multiple sites, such as cg24701780, cg24804145, cg15192120, and cg25497530, are all located on the PTPRN2 gene, and the average correlation between them was | r | avg = 0.737, which was larger than that averaged over the top 100 DNA methylation sites. This gene encodes a protein with sequence similarity to the receptor-like protein tyrosine phosphatase, which accelerates cancer progression and metastasis [72,73]. In Table S3, we summarize the correlation coefficients for the sites on the same gene. In general, they are stronger than those of the top 100 sites, in agreement with the literature [74].

Hubs in DNA Methylation Layer Aggregates Differentially Expressed Genes
To obtain deeper insight into the screened DNA methylation sites and their corresponding genes, we counted the number of differentially expressed genes near the top 100-500 DNA methylation sites ranked by degree centrality and betweenness centrality, separately. We found that the genes corresponding to the DNA methylation sites screened using either measure in the bilayer networks were accessible in two steps to the differentially expressed genes. On the contrary, only a small fraction of the genes corresponding to sites obtained from the single network could reach the differentially expressed genes in two steps. We show the results in Table 3 and Table S4. All tumor data differed, according to the chi-squared test, in both single and bilayer networks (χ 2 p < 0.001). The hub nodes in the DNA methylation layer of the bilayer network aggregated differentially expressed genes. As suggested by Le et al. [11], differentially expressed genes play a crucial role in tumor development.  LUSC  55  100  60  100  BRCA  51  100  51  100  UCEC  70  100  63  100  KIRC  63  100  59  100  BLCA  54  100  54  100 However, this clustering of differentially expressed genes does not mean that all genes corresponding to hub DNA methylation sites are differentially expressed. The genes corresponding to the top 100 DNA methylation sites according to the centrality ranking are rarely differentially expressed and most of them are linked to differentially expressed genes through some noncoding RNA. For example, in the top 100 sites for LUSC, most genes are related to SNHG16 or some other miRNA. SNHG16 is a lncRNA regulating a number of mRNAs in the RNA layer that can be reached in two steps via SNHG16. Because noncoding RNAs regulate multiple mRNAs and mRNAs are regulated by multiple noncoding RNAs, these noncoding RNAs act as bridges between the mRNAs in the RNA layer.
The larger the betweenness centrality of a node, the higher the number of shortest paths traveling through that node. Therefore, the betweenness centrality can be used to identify hub nodes that are located in key pathways in the network that are likely to be involved in the expression of differential genes. For the degree centrality, the hubs identified on the basis of it are more likely to interact with differential genes because they have many neighbors. Similar to the results of the prognostic analysis, the results of the differential expression analysis were also better under the betweenness centrality than under the degree centrality. All genes corresponding to the top 500 DNA methylation sites according to the betweenness centrality reached the differentially expressed genes in two steps for all tumors. Under the degree centrality, only four tumors exhibited the same behavior.

KEGG Pathway Analysis
As mentioned above, the genes corresponding to the top 100-500 DNA methylation sites in terms of the degree and betweenness centralities could be screened in both the bilayer and single networks. The shortest paths in the network are closely related to biological pathways. The genes identified by the betweenness centrality have a high probability of being located in the critical pathway. Analogously, the genes identified on the basis of the degree centrality are likely to be located on some critical pathways due to their large number of neighbors. To explore the functions of those genes, we performed KEGG pathway enrichment analysis (Figures 6 and 7, Table S5). Figure 6 shows the KEGG pathway enrichment results of screening the top 100 genes on the basis of the betweenness centrality, where multiple KEGG pathways are associated with tumors. For hub genes in the bladder cancer bilayer network, the most important pathway is "bladder cancer", in addition to "adherens junction" and "proteoglycans in cancer". The hub genes in the endometrioid cancer bilayer network are enriched in the "PI3K-Akt signaling pathway" and "p53 signaling pathway". The p53 signaling pathway is one of the most well-known cancer-related pathways, playing an integral role in multiple tumors [75]. Proteoglycans in cancer is an important cancer-related pathway closely related to the immune escape of tumor cells [76]. The PI3K-Akt signaling pathway plays an essential role in the regulation of cell survival, growth, and proliferation [77]. Moreover, several cancer-related pathways are also enriched, including "pathways in cancer", "bladder cancer", and "microRNAs in cancer".
ife 2023, 13, x FOR PEER REVIEW 13 of 18 in two steps for all tumors. Under the degree centrality, only four tumors exhibited the same behavior.

KEGG Pathway Analysis
As mentioned above, the genes corresponding to the top 100-500 DNA methylation sites in terms of the degree and betweenness centralities could be screened in both the bilayer and single networks. The shortest paths in the network are closely related to biological pathways. The genes identified by the betweenness centrality have a high probability of being located in the critical pathway. Analogously, the genes identified on the basis of the degree centrality are likely to be located on some critical pathways due to their large number of neighbors. To explore the functions of those genes, we performed KEGG pathway enrichment analysis (Figures 6 and 7, Table S5).   Figure 6 shows the KEGG pathway enrichment results of screening the top 100 genes on the basis of the betweenness centrality, where multiple KEGG pathways are associated with tumors. For hub genes in the bladder cancer bilayer network, the most important pathway is "bladder cancer", in addition to "adherens junction" and "proteoglycans in cancer". The hub genes in the endometrioid cancer bilayer network are enriched in the "PI3K-Akt signaling pathway" and "p53 signaling pathway". The p53 signaling pathway is one of the most well-known cancer-related pathways, playing an integral role in multiple tumors [75]. Proteoglycans in cancer is an important cancer-related pathway closely related to the immune escape of tumor cells [76]. The PI3K-Akt signaling pathway plays an essential role in the regulation of cell survival, growth, and proliferation [77]. Moreover, several cancer-related pathways are also enriched, including "pathways in cancer", "bladder cancer", and "microRNAs in cancer".
The enrichment results of the top 100 screened genes based on the degree centrality are shown in Figure 7. We likewise identified cancer-related pathways, such as "hippo signaling pathway", "central carbon metabolism in cancer", "PI3K-Akt signaling pathway", and "bladder cancer". In summary, the bilayer network approach could find genes involved in tumor-related processes that could not be found by the single networks.

Discussion
DNA methylation can influence life processes by regulating gene expression and is therefore associated with the development of various tumors [70]. In this study, we constructed a bilayer network using DNA methylation and RNA interaction data from five tumors and identified a set of hub DNA methylation sites and genes using centrality indicators. Both the DNA methylation and the RNA layer networks showed the scale-free and small-world characteristics that are essential in biological networks. The majority of the DNA methylation sites screened using the centrality metric were also associated with prognosis, and the bilayer network outperformed the single network, enabling the iden- Figure 7. Gene-enriched KEGG pathways screened from bilayer networks according to degree centrality. The X-axis represents count and ratio of hub genes, dot size represents the number of hub genes, and dot color represents the negative of the logarithm of the p-value.
The enrichment results of the top 100 screened genes based on the degree centrality are shown in Figure 7. We likewise identified cancer-related pathways, such as "hippo signaling pathway", "central carbon metabolism in cancer", "PI3K-Akt signaling pathway", and "bladder cancer". In summary, the bilayer network approach could find genes involved in tumor-related processes that could not be found by the single networks.

Discussion
DNA methylation can influence life processes by regulating gene expression and is therefore associated with the development of various tumors [70]. In this study, we constructed a bilayer network using DNA methylation and RNA interaction data from five tumors and identified a set of hub DNA methylation sites and genes using centrality indicators. Both the DNA methylation and the RNA layer networks showed the scale-free and small-world characteristics that are essential in biological networks. The majority of the DNA methylation sites screened using the centrality metric were also associated with prognosis, and the bilayer network outperformed the single network, enabling the identification of more prognosis-associated sites. By analyzing the correlation between the DNA methylation sites, we illustrated that the sites on the same gene are more strongly correlated. In addition, we found that differentially expressed genes near the hub sites were enriched. Finally, our KEGG analysis revealed that hub genes in the RNA layer were involved in multiple tumor-related pathways.
Regarding the hub nodes identified in the DNA methylation layer, several issues are worth discussing. First, the genes corresponding to the most critical DNA methylation sites were mRNAs, which are closely associated with tumors. For the RNA layer, however, noncoding RNAs are located at more central positions. This is partly due to most of the DNA methylation sites being located on protein-coding genes. We found that 298,715 DNA methylation sites are in the gene region, of which 297,057 are on mRNA and only 1658 are on noncoding RNAs such as lncRNA. Additionally, in the RNA layer, a noncoding RNA can act as a bridge between two mRNAs, which results in the two mRNAs being accessible in two steps, and the DNA methylation sites on these mRNAs are not filtered out. Second, because the DNA methylation sites on the genes that perform similar functions are comethylated, we calculated the correlations between the hub sites, finding that these hubs always showed more positive correlations between them. Although DNA methylation sites do not directly interact, the positive correlation between the hub sites is actually a reflection of the site-gene-gene-site relationship. We speculate that the reason for this positive correlation may be related to the regulation of gene expression by DNA methylation sites as well as post-transcriptional regulation. A combination of site-gene correlations as well as gene-gene correlations may be required to explain this finding.
In the RNA layer, we found that noncoding RNAs act as key bridges in the network. These noncoding RNAs contain lncRNAs and miRNAs, with a smaller number of noncoding RNAs at the central of the network and a larger number of mRNAs at the margin of the network. Only 7 pairs of mRNAs are directly linked, whereas 5,981,746 pairs of mRNAs are indirectly linked through a noncoding RNA. Therefore, the correction of the RNA layer to the DNA methylation layer is affected only by noncoding RNA. The difference in the status of noncoding RNAs and mRNAs also shows that our network is able to allow for some errors in the RNA layer.
Overall, the proposed bilayer network framework has higher fidelity than traditional correlation networks and can be used to effectively analyze multi-mics data to identify many tumor-associated DNA methylation sites and genes that cannot be identified by single networks. We suggest three avenues for future study. First, the methylation of loci is mainly regulated by methylation-modifying proteins, which we did not consider in this study. Protein layers can be incorporated into our multilayer network. Second, we used only TCGA data in the present study, and other data sources may be added for validation in a subsequent study. Finally, for the identification of hub nodes in the network, we only used two typical centrality metrics. Other popular metrics are worth considering.
Supplementary Materials: The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/life13010076/s1. Table S1: Basic TCGA data information. Table  S2: Comparison of prognostic correlation numbers of DNA methylation sites in top 100 to top 500 centrality. Table S3: Correlation between DNA methylation sites on the same gene. Table S4: Number of differential genes within two steps of DNA methylation sites of top 100 to top 500 centrality. Table  S5: KEGG enrichment results of hub genes in top 100 to top 500. Figures S1-S10: Hub nodes found by degree and betweenness centralities constituting the subnetwork of the five tumors.  Data Availability Statement: We used public data accessible from the Cancer Genome Atlas (TCGA) database and RNAInter database.

Conflicts of Interest:
The authors declare no conflict of interest.