Kernel Differential Subgraph Analysis to Reveal the Key Period Affecting Glioblastoma

Glioblastoma (GBM) is a fast-growing type of malignant primary brain tumor. To explore the mechanisms in GBM, complex biological networks are used to reveal crucial changes among different biological states, which reflect on the development of living organisms. It is critical to discover the kernel differential subgraph (KDS) that leads to drastic changes. However, identifying the KDS is similar to the Steiner Tree problem that is an NP-hard problem. In this paper, we developed a criterion to explore the KDS (CKDS), which considered the connectivity and scale of KDS, the topological difference of nodes and function relevance between genes in the KDS. The CKDS algorithm was applied to simulated datasets and three single-cell RNA sequencing (scRNA-seq) datasets including GBM, fetal human cortical neurons (FHCN) and neural differentiation. Then we performed the network topology and functional enrichment analyses on the extracted KDSs. Compared with the state-of-art methods, the CKDS algorithm outperformed on simulated datasets to discover the KDSs. In the GBM and FHCN, seventeen genes (one biomarker, nine regulatory genes, one driver genes, six therapeutic targets) and KEGG pathways in KDSs were strongly supported by literature mining that they were highly interrelated with GBM. Moreover, focused on GBM, there were fifteen genes (including ten regulatory genes, three driver genes, one biomarkers, one therapeutic target) and KEGG pathways found in the KDS of neural differentiation process from activated neural stem cells (aNSC) to neural progenitor cells (NPC), while few genes and no pathway were found in the period from NPC to astrocytes (Ast). These experiments indicated that the process from aNSC to NPC is a key differentiation period affecting the development of GBM. Therefore, the CKDS algorithm provides a unique perspective in identifying cell-type-specific genes and KDSs.


Introduction
Glioblastoma (GBM) is one of the most common and lethal primary tumors, which has a poor prognosis and patients usually survive less than 15 months following diagnosis [1,2]. It is notoriously difficult to treat due to its diffuse nature and our limited knowledge of its molecular pathogenesis [3]. The important steps for determining the optimal therapeutic strategies are understanding the mechanisms of the dynamic processes and identification of new potential biological modules.
Compared with the bulk RNA sequencing, single-cell RNA sequencing (scRNA-seq) can provides important information for inter-cellular transcriptomic heterogeneity and dissecting the interplay Figure 1. The overall framework for criterion to explore the kernel differential subgraph (CKDS). KDS: kernel differential subgraph.
Differential expressed genes (DEGs) are detected from the processed scRNA-seq data by R package 'edgeR'. Similar to the analysis of differential gene expression, we use 'p-value, p.adjust and log2FC' to obtain the DEGs by the gene expression. The R function 'p.adjust' is to adjust the p-values by 'false discovery rate (fdr)' method [15]. Fold change (FC) [16] is calculated simply as the ratio of the difference between final value and the initial value over the original value. In the field of bioinformatics, we commonly use log2 for expressing the FC (log2FC). The genes with the p-value < 0.01, p.adjust < 0.05 and | FC| > 2 are considered as DEGs.

Single-Cell Transcriptome Network Construction by Differential Expressed Genes
Single-cell transcriptome data may lead to high false positives [17]. Therefore, integrated multiomics data analysis has become a trend to solve it in biological network analysis [18]. Proteomics and transcriptomics data are integrated to construct a network [19], in which protein-protein networks (PPN) are used as a backbone network, and Pearson correlation coefficient (PCC) between expression of each pair of genes is used as the weight of edge. In this work, DEGs are connected with known protein-protein interactions (PPIs) documented in STRING database (v10.5, https://stringdb.org/cgi/input.pl). Previous research has shown that when applied to real data, only edges with top 10% PCC were reserved [20]. In the generated STRING network, compared with the original one, over 90% edges disappear, and due to the generic property that the network structure would remain stable during the stable biological stage. At the same time, in order to ensure the effectiveness of PCC, we set |PCC| ≥ 0.6 [21]. Thus, the association of two differential genes is defined as the weight of the edge, and only genes with the value of top 10% PCC and |PCC| ≥ 0.6 are reserved. The overall framework for criterion to explore the kernel differential subgraph (CKDS). KDS: kernel differential subgraph.

Single-Cell Transcriptome Network Construction by Differential Expressed Genes
Single-cell transcriptome data may lead to high false positives [17]. Therefore, integrated multi-omics data analysis has become a trend to solve it in biological network analysis [18]. Proteomics and transcriptomics data are integrated to construct a network [19], in which protein-protein networks (PPN) are used as a backbone network, and Pearson correlation coefficient (PCC) between expression of each pair of genes is used as the weight of edge. In this work, DEGs are connected with known protein-protein interactions (PPIs) documented in STRING database (v10.5, https://string-db.org/cgi/ input.pl). Previous research has shown that when applied to real data, only edges with top 10% PCC were reserved [20]. In the generated STRING network, compared with the original one, over 90% edges disappear, and due to the generic property that the network structure would remain stable during the stable biological stage. At the same time, in order to ensure the effectiveness of PCC, we set |PCC| ≥ 0.6 [21]. Thus, the association of two differential genes is defined as the weight of the edge, and only genes with the value of top 10% PCC and |PCC| ≥ 0.6 are reserved.

Calculating Differential Value of Genes by Graphlet Vector
Graphlets are small connected non-isomorphic induced subgraphs containing 2, 3, 4, or more nodes [22][23][24]. The graphlets of 2-4 nodes are shown in Figure 2. For 2, 3, and 4-node graphlets, the nodes in same color mean the nodes with the same topological structure (degree). There are 15 different kinds of nodes labelled orbits0-orbits14. Each node in the network obtains specific graphlet vector by calculating the frequency in 15 dimensions. Graphlets are small connected non-isomorphic induced subgraphs containing 2, 3, 4, or more nodes [22][23][24]. The graphlets of 2-4 nodes are shown in Figure 2. For 2, 3, and 4-node graphlets, the nodes in same color mean the nodes with the same topological structure (degree). There are 15 different kinds of nodes labelled orbits0-orbits14. Each node in the network obtains specific graphlet vector by calculating the frequency in 15 dimensions. For the node ∈ , ∈ ′, denotes the i th coordinate of its signature vector, i.e. is the number of times node is touched by an orbit i in . The distance ( , ) between the i th orbits of nodes and is defined as [25]: For the node u ∈ V, u ∈ V , u i denotes the i th coordinate of its signature vector, i.e. u i is the number of times node u is touched by an orbit i in V. The distance D i (u, u ) between the i th orbits of nodes u and u is defined as [25]: where w i is the weight of orbit i that accounts for dependencies between orbits [25]. As shown in Equation (2), the d-value between nodes u and u means the total distance.
The distance d − value(u, u ) is in (0, 1), where distance 0 means that signatures of nodes u and u are identical [25]. The more topological structure varies, the larger d-value is. Nodes with d-value larger than 0.4 [23] are selected into the differential nodes set D for the further analysis.

The Criterion to Extract Kernel Differential Subgraph
Kernel differential subgraph extraction is similar to Steiner Tree problem, which is an NP-hard problem. In this work, the criterion to extract KDS is present by four principles. Firstly, the subgraph should be connected. A connected subgraph can discover the dense relationship between molecules. Secondly, the scale of subgraph should be as small as possible. A KDS is the most core subgraph with small scale of the entire network. Thirdly, the d-value of nodes with large topological difference calculated by graphlet should be as large as possible. Nodes with large differences in topology are often key nodes in the network. These nodes will be selected to extract the KDS. Fourthly, the functional relevance between genes should be as strong as possible. It means the higher weight of edges will be chosen to extract the KDS.
There is a cancer network G(V, E) and a normal network G (V, E ) representing two different states. V represents the set of v common nodes; E and E represent the set of edges respectively. Algorithm 1 describes the criterion to discover the KDS (CKDS), where W e represents the weight set of edges. The set D = {D 1 , D 2 , . . . , D v } represents the set of differential nodes with d-value d = {d 1 , d 2 , . . . , d v }. According to the sorted d-value d in descending order, we selected the differential nodes D v (d v ≥ 0.4) [23] to add in KDS. When considering a new path added to KDS, d v and W e mean the sum of d-value of all nodes and weight of all edges on the path. The parameter a and b were coefficient designed to measure the importance of D v and W e . The estimation of the vector (a, b) is discussed in Section 3. KDS x (KDS 1 and KDS 2 ) indicates the KDS of different state. The pseudo code of this algorithm is shown below. Add if D v directly connect with any node existed in KDS x add D v and its edge to KDS x else if calculate the score of the shortest paths from D v to each node in KDS x by Equation (3), Add the path that has the highest Score path to KDS x End Return KDS x End Intersect KDS 1 and KDS 2 to get KDS of G and G After getting the KDS of G and G , the KDS was constructed by Cytoscape (http://www.cytoscape. org/) [26].

Topological Analyses on Kernel Differential Subgraph
The centrality indexes including degree centrality (DC) [27], betweenness centrality (BC) [28], closeness centrality (CC) [29], and eigenvector centrality (EC) [30] were used to analyze the KDS. For a KDS G = (V, E), V and E represent the set of nodes and edges respectively. Four centrality indexes are defined as follows, DC: DC means how many nodes connected to node v, and it can measure node v's centrality apparently. |N v | is the number of node v's neighbors. The degree of node v is formalized by BC: BC is the average length of the shortest paths through node v. Equation (5) is as follow: In which, σ st is the total number of shortest paths from node s to node t. σ st (v) means the number of those paths that go through node v.
CC: In the network V with n nodes, closeness centrality means the degree that node v communicates with other nodes set t = {t 0 , t 1 , . . . , t m }, 0 ≤ m ≤ n − 1. It is calculated by Equation (6): dist(v, t) is the distance of the shortest path from node v to node t m . EC: EC is a measure of the influence of node v on a network. It assigns relative scores to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes [30]. The EC score of node v is shown as Equation (7): α max is the eigenvector corresponding to the largest eigenvalue from A which is the adjacency matrix of KDS.
Different topology analysis methods rely on different network topology structures, which may not comprehensively balance the importance of genes in different biological states. Therefore, we employed four centrality indexes (one local measurement method 'DC' and three global measurement methods 'BC, CC, and EC'). According to four centrality indexes, four scores of each node in the subgraph was calculated and normalized to the number in the range 0 to 1. Each node would have a score to evaluate the topological differences in Equation (8). Multiple centralities can be considered comprehensively to evaluate the node topology.
In the following study, we focused on the nodes with top 10% score, which were with large topological differences in the KDS.

Functional Enrichment Analyses
The Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed to understand the underlying biological mechanisms. GO analyses explored the biological significance of genes by R package 'clusterProfiler' [31]. The enriched GO terms with Gene-Count > 5 and p-value < 0.05 were selected for further assessment [32]. In this paper, we also focused on the top 10% frequently occurring genes in the GO terms. The KEGG analyses were performed on pathways with p-value < 0.05.

Evaluation Indicators
The number of essential genes in the KDS could evaluate the performance of the algorithm. The more essential genes were found, the better the performance of the algorithm was.
As shown in Equation (9), P KDS is calculated to evaluate the performance. N e means the number of essential genes.N p is the number of essential genes in KDS, and N p is the number of essential genes which are not predicted in KDS. P KDS is similar to the evaluation indicator 'Precision' in binary classification problem.
To better evaluate the performance, true negative (TN), false positive (FP), false negative (FN), and true positive (TP) [33] are used to calculate evaluation indicators, including Accuracy, Precision (P KDS ), Recall, and F1-Score as following.
Moreover, F1-Score is a handy indicator for measuring the accuracy of a binary classification model. F1-Score takes Precision and Recall into account, which ranges from 0 to 1. The algorithm is more excellent if the F1-Score is closer to 1.

Simulated Data Generation
According to the principles of biomolecular network [34], we used a simulated data generating algorithm [35] to generate simulated data.
The algorithm could generate two networks with a list of essential genes and two sets of gene expression based on some parameters. The parameters of n 1 and n 2 mean the number of nodes, and m means the number of essential genes in the two networks. The parameter ρ means the proportion of differential edges driven by perturbed genes [35]. The smaller ρ is, the more difficult it is to find essential genes. In this paper, n 1 = n 2 = 100, m = 10, ρ = 0.1.
Two hundred groups of simulated datasets were generated, in which 100 groups (Dataset I) were to get the vector (a, b) in Equation (3) and 100 groups (Dataset II) were to compare the performance of CKDS with other methods.
As the Equation (3) shows, the score of path is influenced by d v and W e , and parameter a andb were designed to measure the importance of d v and W e . In order to distinguish which variable is more influential, the sum of a andb was designed to be 1. For each of the 100 groups (Dataset I), the parameter a andb were taken from 0 to 1 respectively. Thus, the optimal ratio of a andb can be generated by conducting experiments on resulted KDS's prediction precision by Equation (11).
As shown in Figure 3, the parameter a andb around 0.5 (a : b = 1 : 1) gets the KDS with the highest Precision (P KDS ). It reflects that d v is as important as W e . According to the four principles, the CKDS algorithm considers the connectivity and scale of KDS, the topological difference of nodes and function relevance between genes in the KDS. The reason why the ratio of a to b is 1:1 is that when a new shortest path is added to the KDS, the ratio of the number of the points and edges is 1:1. The newly shortest added path meets four principles very well. The value of d v and W e are between 0 and 1 respectively. The ratio of d v to W e is close to 1.  The experiment results showed that when the ratio of a andb is about to 1, the generated KDS can perform well and acquire the reliable result. According to the four principles of CKDS, the final equation to calculate the path in this paper is shown as Equation (14).

Comparison with other Methods on Simulated Datasets
In our work, we compared CKDS with other three differential kernel subgraph extraction algorithms: SMT-Neurophysiology (KDS-SMT) [12], TDKS [13] and KDS based on Floyd (KDS-Floyd) [36]. Each algorithm would get a KDS with essential genes. One hundred groups of simulated datasets(Dataset II) with 10 essential genes were generated to assess the performance of the four algorithms.
After calculating the evaluation indicators by Equations (10)- (13), the results show CKDS is superior to other three algorithms on those measures (Table 1). It proves that CKDS has a good performance to find KDS with essential genes. This is because that CKDS combines multiple principles, which is capable of taking various kinds of differences into consideration. The raw scRNA-seq data was downloaded from the GEO database. To compare GBM and normal cells, 134 fetal human cortical neurons (FHCN) [37] (GSE67835, 25 June, 2019) and 3589 human glioblastoma cells from Darmanis et al [38] (GSE84465, 25 June, 2019) were downloaded to discover the KDS between two states.
Using two scRNA-seq datasets, differential expressed analyses were performed by 'egdeR' [14]. As shown in Figure 4a, 3547 genes were defined as DEGs. Two networks were constructed by the method illustrated in Section 2.1.2. The GBM network consists of 912 nodes with 1986 edges and the FHCN network consisted of 518 nodes with 594 edges. There were 387 common genes in two networks. The common genes were sorted by calculating the graphlet vector in descending order. Finally, using the CKDS algorithm, the KDS of GBM and FHCN was discovered, consisting of 106 genes with a total of 141 interactions in Figure 4b.

The Analyses of Kernel Differential Subgraph
In order to explore the biological mechanisms of GBM, we used network topology and functional enrichment analysis methods on the extracted KDS. However, there is no golden standard in evaluate KDSs in real bio-network. In this paper, the effectiveness of the method can be accessed by literature mining.
According to four centrality indexes, each node in KDS was calculated by Equation (6). We focused on the top 10% nodes with the highest score in KDS. Eleven genes with large topological differences (TGFB1, ITPKB, HRAS, NFKB1, PML, MYD88, ACTN1, CSF1, GAS6, DAB2 and CSNK2B) were chosen from the KDS of GBM and FHCN in Figure 4(b). Eight of the eleven genes were supported by the literature arguing that they had great influence on GBM. Among them, TGFB1, PML and GAS6 are therapeutic targets for GBM. NFKB1, CSF1 and LYN are regulatory genes which facilitate progression of GBM. MYD88 is a biomarker to divide GBM patient. ACTN1 is regulated during the development of astrocytoma cells. HRAS is a driver gene that expression of oncogenic HRAS results in a malignant phenotype in glioma cell lines (Table 2.).

The Analyses of Kernel Differential Subgraph
In order to explore the biological mechanisms of GBM, we used network topology and functional enrichment analysis methods on the extracted KDS. However, there is no golden standard in evaluate KDSs in real bio-network. In this paper, the effectiveness of the method can be accessed by literature mining.
According to four centrality indexes, each node in KDS was calculated by Equation (6). We focused on the top 10% nodes with the highest score in KDS. Eleven genes with large topological differences (TGFB1, ITPKB, HRAS, NFKB1, PML, MYD88, ACTN1, CSF1, GAS6, DAB2 and CSNK2B) were chosen from the KDS of GBM and FHCN in Figure 4(b). Eight of the eleven genes were supported by the literature arguing that they had great influence on GBM. Among them, TGFB1, PML and GAS6 are therapeutic targets for GBM. NFKB1, CSF1 and LYN are regulatory genes which facilitate progression of GBM. MYD88 is a biomarker to divide GBM patient. ACTN1 is regulated during the development of astrocytoma cells. HRAS is a driver gene that expression of oncogenic HRAS results in a malignant phenotype in glioma cell lines ( Table 2). CSF1: colony stimulating factor 1 CSF1 signaling is oncogenic during gliomagenesis through a mechanism distinct from modulating GAM polarization status.

[45]
GAS6: growth arrest specific 6 represent a potential new approach for glioma treatment 18172262 [46] The eleven genes that top 10% frequently occurred in the enriched GO terms were selected from GBM and FHCN and marked in red in Figure 4(b). Supported by the literature, ten of the eleven genes had great influence on GBM. Among them, EGFR, DAXX, ANXA1, ANXA2 and LYN are regulatory genes which promote glioma growth. HSPA1B, EPHA3, INSR and TGFB1 are functional therapeutic targets in glioblastoma (Table 3). MAP2K1 is enriched in the KEGG pathway(hsa05214) for GBM. TGFB1: transforming growth factor beta 1 the oncogenic MSH6-CXCR4-TGFB1 feedback loop is a novel therapeutic target for GBM 30867843 [39] By KEGG enrichment analysis, there was an enriched KEGG pathway (hsa05214: HRAS, MAP2K1, EGFR and CCND1) for GBM.
In summary, by the topology and functional enrichment analyses on the KDS, seventeen genes (nine regulatory genes, six therapeutic targets, one driver gene, one biomarker) and one pathway were found, which were closely interrelated with GBM. The experiments indicated that the KDS extracted by CKDS reflected the large differences between GBM and FHCN, which highly influenced on the development of GBM. To further explore the effects of neurodevelopmental stages and the development of GBM, the raw scRNA-seq data of neural differentiation about neural stem cell lineages from adult mice, including 152 activated neural stem cells (aNSCs), 64 produce neural progenitor cells (NPCs) and 31 astrocytes (Asts) were downloaded from the reference [55]. Three different stages of neural stem cell lineage are divided to Group A (aNSCs and NPCs) and Group B (NPCs and Asts).
Differential expressed analysis was performed by 'egdeR' packages. 1039 DEGs and 790 DEGs were extracted from two groups respectively (Figure 5a,b). The networks were constructed by Section 2.1.2. In Group A, the aNSCs network consisted of 504 nodes with 1492 edges and the NPCs network consisted 686 nodes with 2682 edges. In Group B, the NPCs network consisted of 544 nodes with 2686 edges and the Asts network consisted of 559 nodes with 2724 edges. There were 485 and 517 common genes in two groups respectively. The common genes in each group were sorted by calculating the graphlet vector in descending order. To further explore the effects of neurodevelopmental stages and the development of GBM, the raw scRNA-seq data of neural differentiation about neural stem cell lineages from adult mice, including 152 activated neural stem cells (aNSCs), 64 produce neural progenitor cells (NPCs) and 31 astrocytes (Asts) were downloaded from the reference [55]. Three different stages of neural stem cell lineage are divided to Group A (aNSCs and NPCs) and Group B (NPCs and Asts).
Differential expressed analysis was performed by 'egdeR' packages. 1039 DEGs and 790 DEGs were extracted from two groups respectively ( Figure 5(a) and Figure 5(b)). The networks were constructed by Section 2.1.2. In Group A, the aNSCs network consisted of 504 nodes with 1492 edges and the NPCs network consisted 686 nodes with 2682 edges. In Group B, the NPCs network consisted of 544 nodes with 2686 edges and the Asts network consisted of 559 nodes with 2724 edges. There were 485 and 517 common genes in two groups respectively. The common genes in each group were sorted by calculating the graphlet vector in descending order.
Using the CKDS algorithm, two KDSs of the two groups were discovered, consisting of 107 genes with 151 interactions in KDS-A and 109 genes with 144 edges in KDS-B, as shown in Figure 5(c) and Figure 5(d).

Kernel Differential Subgraph Analyses
In Group A, according to four centrality indexes, top 10% genes with large topological differences in KDS-A was calculated by Equation (6). Eleven genes (Src, Egfr, Gab1, App, Numb, Plcg1, Efnb3, Ptprk, Actn1, Notch2 and Gsn) were chosen from the KDS-A. The border lines of these genes are bolded in Figure 5c. Supported by the literature (Table 4), eight of the eleven genes have influence on GBM. Among them, Src is a driver gene which inhibit the growth of GBM and reduce its survival. Egfr, Gab1, App and Efnb3 are regulatory genes which promote glioma cell proliferation. Numb has effective anti-cancer therapy in glioblastoma. Plcg1 induces GBM radioresistance. Notch2 and miR-181a have potential prognostic value as tumor biomarkers in GBM patients.
Compare with Group A, only few genes (Hsp90aa1, Eprs and Hsp90ab1) supported by the literature references in KDS-B which have influence on GBM (Table 5). Table 4. The biological functions, corresponding PubMed IDs and literatures for genes with large topological changes between activated neural stem cell (aNSC) and neural progenitor cells (NPC).

Symbol: Gene Name Function Roles in GBM PMID References
Src: SRC proto-oncogene, non-receptor tyrosine kinase Reduce human glioma stem cell migration, invasion, and survival 28712848 [56] Egfr: Epidermal growth factor receptor Promote glioma growth and angiogenesis 22139077 [47] Gab1: GRB2 associated binding protein 1 Promote glioma cell proliferation 30016785 [57] App: amyloid beta precursor protein Promote the proliferation of glioma cells to inhibit the differentiation of glioma cells 28789439 [58] Numb: NUMB endocytic adaptor protein Effective anti-cancer therapy 31116627 [59] Plcg1:  Table 5. The biological functions, corresponding PubMed IDs and literatures for genes with large topological changes between NPC and astrocytes (Ast).

Symbol: Gene Name Function Roles in GBM PMID Reference
Hsp90aa1: heat shock protein 90 alpha family class A member 1 survival signatures in GBM 22952576 [63] Eprs: glutamyl-prolyl-tRNA synthetase the protein coding genes in GBM 30572911 [64] Hsp90ab1: heat shock protein 90 alpha (cytosolic), class B member 1 predict prognosis in astrocytic tumors 27258564 [65] From aNSCs to NPCs stage, top 10% frequently occurring genes in the enriched GO terms (Rab4a, Pten, Egfr, Rab10, Rac1, Fgfr1, Gnai1, Ntrk2, Rhob, Kras and Rhou) were selected to look for the biomarkers. These 11 gene nodes are marked in red in Figure 5c. Supported by the literature, eight of the eleven genes have influence on GBM. Among them, Pten and Rac1 are driver genes which inhibit the migration and invasion of GBM. Egfr, Kras, Gnai1, Ntrk2 and Rhob are regulatory genes which drive the initiation and progression of glioma. Fgfr1 induces GBM Radioresistance. In KDS-B, seven genes (Hsp90aa1, Hsp90ab1, Atp1b2, Trp53, Hspa8, Usp22 and Atp1a2) are supported by the literature references which have influence on GBM (Tables 6 and 7).  Atp1b2: ATPase Na+/K+ transporting subunit beta 2 Na + /K + -ATPase β2-subunit (AMOG) expression abrogates invasion of glioblastoma-derived brain tumor-initiating cells.

[71]
Trp53: tumor protein p53 Induce G1/S phase cell cycle arrest in glioblastoma cells 31001122 [72] Hspa8: heat shock protein family A (Hsp70) member 8 Inhibition of nestin suppresses stem cell phenotype of glioblastomas 25527454 [73] Usp22: ubiquitin specific peptidase 22 Increase the abilities of proliferation, migration and invasion of glioma cells, and promote the growth and development of glioma 30223389 [74] Atp1a2: ATPase, Na+/K+ transporting, alpha 2 polypeptide Induce tumor progression and temozolomide resistance in glioma 27837435 [75] By KEGG enrichment analyses, the KDS enriched lots of KEGG pathways related to cancer, particularly, the KEGG pathway mmu05214 (Egfr, Plcg1, Kras and Pten) is exactly the pathway of GBM.
In summary, the KDS-A involved ten regulatory genes, three driver genes, one biomarkers, one therapeutic target of GBM. These fifteen genes and the KEGG pathway in KDS-A highly influenced on the development of GBM. However, there was few genes and no pathway of GBM in KDS-B.
The topological and functional enrichment analyses indicated the genes and pathways associated with glioma and cancers are significantly reduced during the period from NPC to Ast. It suggests that the critical period of GBM development is from aNSC to NPC other than NPC to Ast.
Gliomas are malignant primary tumors of the central nervous system. Their cell-of-origin is thought to be a neural progenitor or stem cell that acquires mutations leading to oncogenic transformation [76]. By the CKDS algorithm, we proved that the stage of aNSCs to NPCs is a critical period affecting the development of GBM.

Conclusions
Complex biological networks are used to explore the mechanisms in complex diseases. Crucial changes in different networks reflect on the development of living organisms. Therefore, it is significant to discover the KDS leading to drastic changes.
In this work, we developed a criterion to discover KDS called CKDS. The criterion fully considered the factors affecting KDS, including the connectivity and scale of KDS, the topological difference of nodes and function relevance between genes in the KDS. As a result, the CKDS algorithm discovered the KDS in different states.
The CKDS algorithm was applied to simulated datasets and three scRNA-seq datasets including GBM, FHCN, and neural differentiation. Compared with the other state-of-art methods, the CKDS algorithm outperformed in simulated datasets to discover the KDSs. In the scRNA-seq datasets, we performed the network topology and functional enrichment analyses on the extracted KDSs. Many genes, including genetic biomarkers, driver genes, regulatory genes, and therapeutic targets, and pathways in the KDSs are closely interrelated to GBM, indicating that CKDS could express the kernel difference between different states. Moreover, the KEGG pathway of GBM is only in neural differentiation period from aNSC to NPC other than NPC to Ast, indicating that the period from aNSC to NPC is an important neural differentiation period affecting the development of GBM. In addition, the CKDS algorithm provides a unique perspective in identifying cell-type-specific genes and KDSs.
Based on the prediction of CKDS, the genes that were not supported by literature will be verified by conducting a series of biological experiments in the future. Moreover, the CKDS algorithm can be extended to scRNA-seq datasets of other complex diseases for detecting the molecular features of pathogenesis mechanisms and biomarkers.