Figure 1.
Identification of two GBM prognostic subtypes through unsupervised clustering. (A) Workflow of GBM molecular subtype. The figure outlines the comprehensive workflow for molecular subtyping of glioblastoma multiforme (GBM) using data from the TCGA-GBM cohort, which includes 166 patients. The analysis begins with gene expression data (G1 Gene, n = 58,938) from RNA sequencing (RNA-seq) datasets and integrates pathway data (PW1 Pathway, n = 35,206) from the Molecular Signatures Database (MSigDB). Consensus clustering was performed utilizing the Minkowski distance metric, resampling 80% of the samples ten times, and evaluating statistical significance by calculating mean cluster consensus. Survival analysis of the identified clusters revealed two clusters, C1 (n = 57) and C2 (n = 109), with significant differences in patient prognosis. (B) Kaplan–Meier survival curves for two clusters (CN = 2). The survival curves for cluster C1 (red) and cluster C2 (blue) show significant differences in survival probability (p < 0.0001, HR = 0.61, log rank test). The shaded areas indicate the 95% confidence intervals. The dashed lines indicate the median survival time for each cluster. (C) The consensus clustering matrix for CN = 2. The x-axis and y-axis represent individual patients, with the color intensity represents the correlation distance; darker shades indicate higher correlation and lighter shades indicate lower correlation. (D) Cluster silhouette evaluation (C1 in red and C2 in blue). The y-axis represents the silhouette coefficient, measuring an object’s similarity to its own cluster compared to others. The x-axis shows individual samples ordered within each cluster. Average silhouette width is 0.13, indicated by the dashed red line. (E) Principal component analysis (PCA) plot of GBM. The x-axis (PCA1) and y-axis (PCA2) represent the first and second principal components, respectively. (F) Clinical profile of TCGA-GBM samples, displaying clusters (C1 and C2), gender (male and female), gene expression subtypes (mesenchymal, neural, classical, proneural, unknown), and methylation clusters (Met 1-6, unknown). Heatmaps show the distribution of these features across samples, with the middle panel depicting overall survival (OS) time in days and age at diagnosis for each sample. Color bars represent clusters, gender, gene expression subtypes, methylation clusters, age, and OS time. (G) CoOncoplot of mutation profiles for top six mutated genes (TP53, EGFR, PTEN, TTN, MUC16, and NF1) in clusters C1 (n = 55) and C2 (n = 104). The y-axis lists the mutation frequency percentages for each gene within each cluster.
Figure 1.
Identification of two GBM prognostic subtypes through unsupervised clustering. (A) Workflow of GBM molecular subtype. The figure outlines the comprehensive workflow for molecular subtyping of glioblastoma multiforme (GBM) using data from the TCGA-GBM cohort, which includes 166 patients. The analysis begins with gene expression data (G1 Gene, n = 58,938) from RNA sequencing (RNA-seq) datasets and integrates pathway data (PW1 Pathway, n = 35,206) from the Molecular Signatures Database (MSigDB). Consensus clustering was performed utilizing the Minkowski distance metric, resampling 80% of the samples ten times, and evaluating statistical significance by calculating mean cluster consensus. Survival analysis of the identified clusters revealed two clusters, C1 (n = 57) and C2 (n = 109), with significant differences in patient prognosis. (B) Kaplan–Meier survival curves for two clusters (CN = 2). The survival curves for cluster C1 (red) and cluster C2 (blue) show significant differences in survival probability (p < 0.0001, HR = 0.61, log rank test). The shaded areas indicate the 95% confidence intervals. The dashed lines indicate the median survival time for each cluster. (C) The consensus clustering matrix for CN = 2. The x-axis and y-axis represent individual patients, with the color intensity represents the correlation distance; darker shades indicate higher correlation and lighter shades indicate lower correlation. (D) Cluster silhouette evaluation (C1 in red and C2 in blue). The y-axis represents the silhouette coefficient, measuring an object’s similarity to its own cluster compared to others. The x-axis shows individual samples ordered within each cluster. Average silhouette width is 0.13, indicated by the dashed red line. (E) Principal component analysis (PCA) plot of GBM. The x-axis (PCA1) and y-axis (PCA2) represent the first and second principal components, respectively. (F) Clinical profile of TCGA-GBM samples, displaying clusters (C1 and C2), gender (male and female), gene expression subtypes (mesenchymal, neural, classical, proneural, unknown), and methylation clusters (Met 1-6, unknown). Heatmaps show the distribution of these features across samples, with the middle panel depicting overall survival (OS) time in days and age at diagnosis for each sample. Color bars represent clusters, gender, gene expression subtypes, methylation clusters, age, and OS time. (G) CoOncoplot of mutation profiles for top six mutated genes (TP53, EGFR, PTEN, TTN, MUC16, and NF1) in clusters C1 (n = 55) and C2 (n = 104). The y-axis lists the mutation frequency percentages for each gene within each cluster.
![Cimb 48 00103 g001 Cimb 48 00103 g001]()
Figure 2.
Identification of featured biomarkers for C1/C2 subtypes via Weighted Correlation Network Analysis (WGCNA). (A) Workflow and key findings of WGCNA in GBM subtypes. The analysis started with a cohort of GBM patients (n = 166) and normal controls (n = 5) from the TCGA database. Differentially expressed genes (DEGs) were identified: 13,618 DEGs from the comparison between GBM and normal tissues, and 6598 DEGs from the comparison between C1-GBM (n = 57) and C2-GBM (n = 109). The intersection of these two sets yielded 2555 tumor-related DEGs. Univariate Cox regression analysis of these DGEs identified 964 genes with prognostic significance. These DEGs were subjected to WGCNA, resulting in the identification of ten modules, each assigned a different color. Protein interaction network analysis highlighted genes involved in immune system and cell cycle processes. Hub genes identified for each cluster were C1-GBM (IGKV3-11, VAMP8, LAIR1, COL1A2, PLAUR) and C2-GBM (CKAP2L, PRSS51, MAST1, SOX6). (B) Module–module correlation heatmap from WGCNA. Each square shows the correlation between two modules (red, stronger; blue, weaker). Numbers in parentheses indicate the number of genes in each module: Magenta (n = 41), Yellow (n = 180), Turquoise (n = 159), Black (n = 46), Red (n = 7), Grey (n = 28), Brown (n = 39), Green (n = 45), Blue (n = 352), and Pink (n = 67). (C) Correlation of modules and hub genes with subtypes. The heatmap shows correlations between WGCNA modules and subtypes (C1 and C2) along with their corresponding hub genes. Correlation coefficients are color-coded: red for positive and green for negative correlations. Statistical significance is indicated as *** p < 0.001; ns indicates no significant difference. Modules and their hub genes are Magenta (IGKV3-11), Turquoise (LAIR1), Yellow (VAMP8), Black (COL1A2), Red (PLAUR), Brown (CKAP2L), Green (PRSS51), Blue (MAST1), and Pink (SOX6). (D,E) Protein–protein interaction analysis in the STRING database for the Yellow (D) and Brown (E) modules. (F) Violin plots of hub genes expression levels (Log2 FPKM) in TCGA-GBM samples. Genes include IGKV3-11, LAIR1, VAMP8, CKAP2L, COL1A2, MAST1, PLAUR, PRSS51, and SOX6. Statistical significance is denoted as **** p < 0.0001. (G) Heatmap of correlation between hub gene expression (RNA sequencing scores) and cancer stem cell (CSC) enrichment scores. Correlation coefficients are color-coded, with red for positive and blue for negative correlations. Statistical significance is denoted as * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001; ns indicates no significant difference. Hub genes include SOX6, PRSS51, PLAUR, MAST1, COL1A2, CKAP2L, VAMP8, IGKV3-11, and LAIR1. (H) Risk score analysis based on hub genes. Kaplan–Meier survival curve comparing patients stratified into low- and high-risk groups according to risk scores derived from hub gene expression (p < 0.001, HR = 2.33, log rank test). Blue and red lines indicate the low- and high- risk groups, respectively. Shaded areas denote the 95% confidence intervals.
Figure 2.
Identification of featured biomarkers for C1/C2 subtypes via Weighted Correlation Network Analysis (WGCNA). (A) Workflow and key findings of WGCNA in GBM subtypes. The analysis started with a cohort of GBM patients (n = 166) and normal controls (n = 5) from the TCGA database. Differentially expressed genes (DEGs) were identified: 13,618 DEGs from the comparison between GBM and normal tissues, and 6598 DEGs from the comparison between C1-GBM (n = 57) and C2-GBM (n = 109). The intersection of these two sets yielded 2555 tumor-related DEGs. Univariate Cox regression analysis of these DGEs identified 964 genes with prognostic significance. These DEGs were subjected to WGCNA, resulting in the identification of ten modules, each assigned a different color. Protein interaction network analysis highlighted genes involved in immune system and cell cycle processes. Hub genes identified for each cluster were C1-GBM (IGKV3-11, VAMP8, LAIR1, COL1A2, PLAUR) and C2-GBM (CKAP2L, PRSS51, MAST1, SOX6). (B) Module–module correlation heatmap from WGCNA. Each square shows the correlation between two modules (red, stronger; blue, weaker). Numbers in parentheses indicate the number of genes in each module: Magenta (n = 41), Yellow (n = 180), Turquoise (n = 159), Black (n = 46), Red (n = 7), Grey (n = 28), Brown (n = 39), Green (n = 45), Blue (n = 352), and Pink (n = 67). (C) Correlation of modules and hub genes with subtypes. The heatmap shows correlations between WGCNA modules and subtypes (C1 and C2) along with their corresponding hub genes. Correlation coefficients are color-coded: red for positive and green for negative correlations. Statistical significance is indicated as *** p < 0.001; ns indicates no significant difference. Modules and their hub genes are Magenta (IGKV3-11), Turquoise (LAIR1), Yellow (VAMP8), Black (COL1A2), Red (PLAUR), Brown (CKAP2L), Green (PRSS51), Blue (MAST1), and Pink (SOX6). (D,E) Protein–protein interaction analysis in the STRING database for the Yellow (D) and Brown (E) modules. (F) Violin plots of hub genes expression levels (Log2 FPKM) in TCGA-GBM samples. Genes include IGKV3-11, LAIR1, VAMP8, CKAP2L, COL1A2, MAST1, PLAUR, PRSS51, and SOX6. Statistical significance is denoted as **** p < 0.0001. (G) Heatmap of correlation between hub gene expression (RNA sequencing scores) and cancer stem cell (CSC) enrichment scores. Correlation coefficients are color-coded, with red for positive and blue for negative correlations. Statistical significance is denoted as * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001; ns indicates no significant difference. Hub genes include SOX6, PRSS51, PLAUR, MAST1, COL1A2, CKAP2L, VAMP8, IGKV3-11, and LAIR1. (H) Risk score analysis based on hub genes. Kaplan–Meier survival curve comparing patients stratified into low- and high-risk groups according to risk scores derived from hub gene expression (p < 0.001, HR = 2.33, log rank test). Blue and red lines indicate the low- and high- risk groups, respectively. Shaded areas denote the 95% confidence intervals.
![Cimb 48 00103 g002 Cimb 48 00103 g002]()
Figure 3.
Classification model utilizing machine learning on subtype-specific hub genes. (A) Machine learning workflow for subtype classification using neural networks. This diagram outlines the process of training and validating a neural network for classifying subtypes C1 and C2 subtypes using the TCGA-GBM dataset, with external validation using the CGGA-GBM dataset. Key steps include the following: (1) Training data from the TCGA-GBM dataset includes 166 samples, utilizing hub genes as input features over 500 iterations. (1) (2) Neural network configuration involves using four out of five folds for training in each iteration, testing various configurations of hidden layers (1 to 4) and neuron nodes (4 to 8) to determine the optimal network architecture. (3) Internal validation with the remaining fifth folds is used to compare observed subtypes with predicted clusters, employing backpropagation to adjust the network weights based on the validation outcomes. (4) The final classifier for C1/C2 classification consists of one hidden layer with five neurons. (5) External validation using the CGGA-GBM dataset includes 85 samples. (B) Internal validation of the TCGA-GBM dataset (n = 166) using 5-fold cross-validation. The dataset (n = 166) undergoes five-fold cross-validation, where one fold serves as the test set in each iteration and the others as training sets, generating validation errors (E1 to E5). This process is repeated over 500 iterations, with the mean error (E) calculated from all iterations. (C) Bar plot of neural networks performance metrics with different hidden layer and neuron counts. (D) Neural network architecture for subtype classification. The network consists of an input layer with 9 features (SOX6, PRSS1, MAST1, CKAP2L, PLAUR, COL1A2, VAMP8, LAIR1, and IGKV3-11), a single hidden layer with 5 neurons, and an output layer classifying into C1 (red) and C2 (blue). (E) Bar plot of hub genes features importance in the TCGA-GBM dataset. The importance of genes is determined by the BP neural network classifier. SOX6 ranks as the most influential, followed by CKAP2L, MAST1, PRSS51, IGKV3-11, COL1A2, PLAUR, VAMP8, and LAIR1. The color intensity of the bars reflects gene feature importance (higher importance is shown with darker colors). (F) Kaplan–Meier survival curves for classified clusters. The survival curves for cluster C1 (red, n = 26) and cluster C2 (blue, n = 59) are obtained using the BP neural network classifier on the external CGGA-GBM dataset (p = 0.02, HR = 0.55, log rank test), with shaded areas indicating the 95% confidence intervals. The dashed lines indicate the median survival time. (G) PCA of C1 and C2 subtypes in the external CGGA-GBM dataset classified by the BP neural network classifier, illustrating the distribution of GBM samples. The x-axis (PCA1) and y-axis (PCA2) represent the first and second principal components, respectively. (H) Violin plots of hub genes expression levels (Log2 FPKM) in CGGA-GBM samples. The genes include IGKV3-11, LAIR1, VAMP8, CKAP2L, COL1A2, MAST1, PLAUR, PRSS51, and SOX6, which demonstrate consistent expression patterns in TCGA-GBM. Statistical significance is indicated as *** p < 0.001.
Figure 3.
Classification model utilizing machine learning on subtype-specific hub genes. (A) Machine learning workflow for subtype classification using neural networks. This diagram outlines the process of training and validating a neural network for classifying subtypes C1 and C2 subtypes using the TCGA-GBM dataset, with external validation using the CGGA-GBM dataset. Key steps include the following: (1) Training data from the TCGA-GBM dataset includes 166 samples, utilizing hub genes as input features over 500 iterations. (1) (2) Neural network configuration involves using four out of five folds for training in each iteration, testing various configurations of hidden layers (1 to 4) and neuron nodes (4 to 8) to determine the optimal network architecture. (3) Internal validation with the remaining fifth folds is used to compare observed subtypes with predicted clusters, employing backpropagation to adjust the network weights based on the validation outcomes. (4) The final classifier for C1/C2 classification consists of one hidden layer with five neurons. (5) External validation using the CGGA-GBM dataset includes 85 samples. (B) Internal validation of the TCGA-GBM dataset (n = 166) using 5-fold cross-validation. The dataset (n = 166) undergoes five-fold cross-validation, where one fold serves as the test set in each iteration and the others as training sets, generating validation errors (E1 to E5). This process is repeated over 500 iterations, with the mean error (E) calculated from all iterations. (C) Bar plot of neural networks performance metrics with different hidden layer and neuron counts. (D) Neural network architecture for subtype classification. The network consists of an input layer with 9 features (SOX6, PRSS1, MAST1, CKAP2L, PLAUR, COL1A2, VAMP8, LAIR1, and IGKV3-11), a single hidden layer with 5 neurons, and an output layer classifying into C1 (red) and C2 (blue). (E) Bar plot of hub genes features importance in the TCGA-GBM dataset. The importance of genes is determined by the BP neural network classifier. SOX6 ranks as the most influential, followed by CKAP2L, MAST1, PRSS51, IGKV3-11, COL1A2, PLAUR, VAMP8, and LAIR1. The color intensity of the bars reflects gene feature importance (higher importance is shown with darker colors). (F) Kaplan–Meier survival curves for classified clusters. The survival curves for cluster C1 (red, n = 26) and cluster C2 (blue, n = 59) are obtained using the BP neural network classifier on the external CGGA-GBM dataset (p = 0.02, HR = 0.55, log rank test), with shaded areas indicating the 95% confidence intervals. The dashed lines indicate the median survival time. (G) PCA of C1 and C2 subtypes in the external CGGA-GBM dataset classified by the BP neural network classifier, illustrating the distribution of GBM samples. The x-axis (PCA1) and y-axis (PCA2) represent the first and second principal components, respectively. (H) Violin plots of hub genes expression levels (Log2 FPKM) in CGGA-GBM samples. The genes include IGKV3-11, LAIR1, VAMP8, CKAP2L, COL1A2, MAST1, PLAUR, PRSS51, and SOX6, which demonstrate consistent expression patterns in TCGA-GBM. Statistical significance is indicated as *** p < 0.001.
![Cimb 48 00103 g003 Cimb 48 00103 g003]()
Figure 4.
Differential expression genes analyses indicate characteristics of C1 and C2 subtypes. (A) Volcano plot of differential expression genes (DEGs) between C1 (n = 1181) and C2 (n = 1374) subtypes in the TCGA-GBM dataset. The x-axis represents the Log2 FC in gene expression between the two subtypes, while the y-axis shows the negative logarithm (base 10) of the false discovery rate (−Log10 FDR). Each dot represents a gene, color-coded by expression level from blue (low expression) to red (high expression). Vertical dashed lines indicate the threshold for significant differential expression (|Log2 FC| > 1). Genes to the right of the dashed line are upregulated in C1, while genes to the left are upregulated in C2. (B) Circular plot of the KEGG pathway analysis of DEGs between C1 and C2 subtypes in the TCGA-GBM dataset. (C) Dot plot of KEGG pathway enrichment analysis of DEGs. The x-axis represents the gene ratio, which measures the proportion of DEGs involved in a specific pathway relative to the total number of DEGs. The y-axis lists the significantly enriched KEGG pathways. Each dot represents a KEGG pathway, with the dot size indicating the count of DEGs involved. (D) Gene Ontology (GO) enrichment analysis of DEGs in the TCGA-GBM dataset. The x-axis represents the −Log10 FDR value, indicating enrichment significance. The y-axis lists the significantly enriched GO terms categorized into Biological Processes (BPs, red), Cellular Components (CCs, blue), and Molecular Functions (MFs, green). Dot size reflects the number of DEGs associated with each GO term, with larger dots indicating a higher count of genes. (E) Hallmark gene set analysis comparing C1 and C2 subtypes with normal controls. The x-axis represents different hallmark pathways and the y-axis lists the samples categorized as C1, C2, and normal. Statistical significance denoted as *** p < 0.001. (F) Evaluation of pathway enrichment scores. This panel examine the enrichment score for the IL6_JAK_STAT3_SIGNALING and G2/M_CHECKPOINT pathways in the CGGA-GBM dataset. Statistical significance denoted as *** p < 0.001. (G) Comparative analysis of cancer driver gene abundance between C1 and C2 subtypes in the TCGA-GBM dataset, with the bar plot showing the Log2 FC in the expression of cancer driver genes between C1 (red) and C2 (blue) subtypes.
Figure 4.
Differential expression genes analyses indicate characteristics of C1 and C2 subtypes. (A) Volcano plot of differential expression genes (DEGs) between C1 (n = 1181) and C2 (n = 1374) subtypes in the TCGA-GBM dataset. The x-axis represents the Log2 FC in gene expression between the two subtypes, while the y-axis shows the negative logarithm (base 10) of the false discovery rate (−Log10 FDR). Each dot represents a gene, color-coded by expression level from blue (low expression) to red (high expression). Vertical dashed lines indicate the threshold for significant differential expression (|Log2 FC| > 1). Genes to the right of the dashed line are upregulated in C1, while genes to the left are upregulated in C2. (B) Circular plot of the KEGG pathway analysis of DEGs between C1 and C2 subtypes in the TCGA-GBM dataset. (C) Dot plot of KEGG pathway enrichment analysis of DEGs. The x-axis represents the gene ratio, which measures the proportion of DEGs involved in a specific pathway relative to the total number of DEGs. The y-axis lists the significantly enriched KEGG pathways. Each dot represents a KEGG pathway, with the dot size indicating the count of DEGs involved. (D) Gene Ontology (GO) enrichment analysis of DEGs in the TCGA-GBM dataset. The x-axis represents the −Log10 FDR value, indicating enrichment significance. The y-axis lists the significantly enriched GO terms categorized into Biological Processes (BPs, red), Cellular Components (CCs, blue), and Molecular Functions (MFs, green). Dot size reflects the number of DEGs associated with each GO term, with larger dots indicating a higher count of genes. (E) Hallmark gene set analysis comparing C1 and C2 subtypes with normal controls. The x-axis represents different hallmark pathways and the y-axis lists the samples categorized as C1, C2, and normal. Statistical significance denoted as *** p < 0.001. (F) Evaluation of pathway enrichment scores. This panel examine the enrichment score for the IL6_JAK_STAT3_SIGNALING and G2/M_CHECKPOINT pathways in the CGGA-GBM dataset. Statistical significance denoted as *** p < 0.001. (G) Comparative analysis of cancer driver gene abundance between C1 and C2 subtypes in the TCGA-GBM dataset, with the bar plot showing the Log2 FC in the expression of cancer driver genes between C1 (red) and C2 (blue) subtypes.
![Cimb 48 00103 g004 Cimb 48 00103 g004]()
Figure 5.
C1-GBM exhibits an immune-infiltrated (“hot”) tumor microenvironment with checkpoint upregulation. (A) Immune phenotype of C1 and C2 subtypes in the TCGA-GBM dataset. The left panel uses UMAP (Uniform Manifold Approximation and Projection) for dimensionality reduction to display the distribution of “hot” and “cold” tumors, with “hot” tumors in red squares and “cold” tumors in green squares. The right panel displays C1 and C2 subtypes, using the same UMAP dimensions: C1 is in red triangles and C2 is in blue triangles. (B,C) Violin plots of the enrichment scores for immune, stromal, and microenvironment (immune + stroma) components in C1 (red) and C2 (blue) subtypes in both the TCGA-GBM and CGGA-GBM datasets. Statistical significance denoted as * p < 0.05, **** p < 0.0001. (D) Differential cell type enrichment in the TCGA-GBM dataset via xCell. The y-axis represents the log2 FC in cell type abundance, with positive values indicating higher enrichment in C1 and negative values indicating higher enrichment in C2. The x-axis lists the cell types analyzed, and dot size represents FDR, with smaller dots indicating lower FDR values. (E) Differential cell type enrichment in the CGGA-GBM dataset via xCell. The y-axis represents the enrichment score, while the x-axis lists the cell types analyzed. Statistical significance denoted as *** p < 0.001, **** p < 0.0001. (F) Differential cell type enrichment in the TCGA-GBM dataset via quanTIseq. The y-axis represents the enrichment score, while the x-axis lists the cell types analyzed. Statistical significance denoted as **** p < 0.0001; ns indicates no significant difference. (G) Differential cell type enrichment in the TCGA-GBM dataset via CIBERSORT. Statistical significance denoted as * p < 0.05, ** p < 0.01, *** p < 0.001. (H) Immune phenoscore enrichment in C1 and C2 subtypes in the TCGA-GBM dataset. The y-axis represents the enrichment score for immune components, including MHC molecules, effector cells, suppressor cells, and checkpoints. Statistical significance denoted as **** p < 0.0001. In subfigures (E–H), red and blue indicate C1 and C2, respectively. In subfigures (D–G), cell types shown in red text denote overlapping findings identified across different methods. (I) Potential mechanisms of the C1 subtype.
Figure 5.
C1-GBM exhibits an immune-infiltrated (“hot”) tumor microenvironment with checkpoint upregulation. (A) Immune phenotype of C1 and C2 subtypes in the TCGA-GBM dataset. The left panel uses UMAP (Uniform Manifold Approximation and Projection) for dimensionality reduction to display the distribution of “hot” and “cold” tumors, with “hot” tumors in red squares and “cold” tumors in green squares. The right panel displays C1 and C2 subtypes, using the same UMAP dimensions: C1 is in red triangles and C2 is in blue triangles. (B,C) Violin plots of the enrichment scores for immune, stromal, and microenvironment (immune + stroma) components in C1 (red) and C2 (blue) subtypes in both the TCGA-GBM and CGGA-GBM datasets. Statistical significance denoted as * p < 0.05, **** p < 0.0001. (D) Differential cell type enrichment in the TCGA-GBM dataset via xCell. The y-axis represents the log2 FC in cell type abundance, with positive values indicating higher enrichment in C1 and negative values indicating higher enrichment in C2. The x-axis lists the cell types analyzed, and dot size represents FDR, with smaller dots indicating lower FDR values. (E) Differential cell type enrichment in the CGGA-GBM dataset via xCell. The y-axis represents the enrichment score, while the x-axis lists the cell types analyzed. Statistical significance denoted as *** p < 0.001, **** p < 0.0001. (F) Differential cell type enrichment in the TCGA-GBM dataset via quanTIseq. The y-axis represents the enrichment score, while the x-axis lists the cell types analyzed. Statistical significance denoted as **** p < 0.0001; ns indicates no significant difference. (G) Differential cell type enrichment in the TCGA-GBM dataset via CIBERSORT. Statistical significance denoted as * p < 0.05, ** p < 0.01, *** p < 0.001. (H) Immune phenoscore enrichment in C1 and C2 subtypes in the TCGA-GBM dataset. The y-axis represents the enrichment score for immune components, including MHC molecules, effector cells, suppressor cells, and checkpoints. Statistical significance denoted as **** p < 0.0001. In subfigures (E–H), red and blue indicate C1 and C2, respectively. In subfigures (D–G), cell types shown in red text denote overlapping findings identified across different methods. (I) Potential mechanisms of the C1 subtype.
![Cimb 48 00103 g005 Cimb 48 00103 g005]()
Figure 6.
C2-GBM exhibits a tumor-intrinsic proliferative state with G2/M checkpoint activation. (A) Gene expression heatmap of DEGs involved in G2/M checkpoint. Red, blue, and green represent C1, C2, and Normal, respectively. (B) Correlation study between CELL_CYCLE genes and G2/M checkpoint genes. Statistical significance denoted as **** p < 0.0001. (C,D) Comparative expression analysis of five key genes at the G2/M checkpoint in TCGA-GBM and CGGA-GBM. Statistical significance is denoted as * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001. (E) Potential mechanisms associated with the C2 subtype.
Figure 6.
C2-GBM exhibits a tumor-intrinsic proliferative state with G2/M checkpoint activation. (A) Gene expression heatmap of DEGs involved in G2/M checkpoint. Red, blue, and green represent C1, C2, and Normal, respectively. (B) Correlation study between CELL_CYCLE genes and G2/M checkpoint genes. Statistical significance denoted as **** p < 0.0001. (C,D) Comparative expression analysis of five key genes at the G2/M checkpoint in TCGA-GBM and CGGA-GBM. Statistical significance is denoted as * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001. (E) Potential mechanisms associated with the C2 subtype.
Figure 7.
Establishment of a GBM survival model. (A) Comparison of clinical and molecular characteristics between C1 (left, n = 98) and C2 (right, n = 68) subtypes, sorted by overall survival (OS) time. The top panel displays age distribution, followed by OS time in days, with a heatmap indicating OS duration (darker shades indicate longer survival). Subsequent panels show gender distribution and IDH1 mutation status. The heatmap displays microenvironment scores, gene expression levels for IL6 pathway genes (OSMR, STAT3, MYD88, IL6ST, and SOCS3) in C1, and G2/M checkpoint pathway genes (ABRAXAS1, UBE2V2, PSMF1, PSMA8, and KAT5) in C2. (B) Constructed nomogram model predicting the 1- and 3-year survival probabilities for patients with C1 (left) and C2 (right) risk scores. Density plots display distribution of age, IL6 pathway genes (OSMR, STAT3, MYD88, IL6ST, and SOCS3) in C1, and G2/M checkpoint pathway genes (ABRAXAS1, UBE2V2, PSMF1, PSMA8, and KAT5) in C2. Distribution of category variable (gender) is reflected by the size of the circle. Risk stratification was performed using the R package maxstat to determine optimal cut-off points for total points derived from a Cox model, dividing patients into low-, middle-, and high-risk groups. These cut-offs were calculated through stepwise maximally selected rank statistics, ensuring alignment with survival outcomes. (C,D) Analysis of the area under the curve (AUC) for 1-year and 3-year prognostic predictions using the nomogram model of C1/C2 in the TCGA-GBM dataset. The dashed diagonal line indicates the no-discrimination reference (random classifier; AUC = 0.5). (E,F) TCGA-GBM patients in the cohort at different risks stratified according to the nomogram of C1/C2 subtype. In C1-GBM, the sample sizes were as follows: low-risk group (n = 8), middle-risk group (n = 47), and high-risk group (n = 39). In C2-GBM, the middle-risk group with one patient was merged into the low-risk group, resulting in sample sizes of low-risk group (n = 10) and high-risk group (n = 52). NA values were removed when calculating the cutoff values for low-, middle-, and high-risk groups in both C1-GBM and C2-GBM subtypes. This resulted in a reduction in the sample sizes allocated to each risk group.
Figure 7.
Establishment of a GBM survival model. (A) Comparison of clinical and molecular characteristics between C1 (left, n = 98) and C2 (right, n = 68) subtypes, sorted by overall survival (OS) time. The top panel displays age distribution, followed by OS time in days, with a heatmap indicating OS duration (darker shades indicate longer survival). Subsequent panels show gender distribution and IDH1 mutation status. The heatmap displays microenvironment scores, gene expression levels for IL6 pathway genes (OSMR, STAT3, MYD88, IL6ST, and SOCS3) in C1, and G2/M checkpoint pathway genes (ABRAXAS1, UBE2V2, PSMF1, PSMA8, and KAT5) in C2. (B) Constructed nomogram model predicting the 1- and 3-year survival probabilities for patients with C1 (left) and C2 (right) risk scores. Density plots display distribution of age, IL6 pathway genes (OSMR, STAT3, MYD88, IL6ST, and SOCS3) in C1, and G2/M checkpoint pathway genes (ABRAXAS1, UBE2V2, PSMF1, PSMA8, and KAT5) in C2. Distribution of category variable (gender) is reflected by the size of the circle. Risk stratification was performed using the R package maxstat to determine optimal cut-off points for total points derived from a Cox model, dividing patients into low-, middle-, and high-risk groups. These cut-offs were calculated through stepwise maximally selected rank statistics, ensuring alignment with survival outcomes. (C,D) Analysis of the area under the curve (AUC) for 1-year and 3-year prognostic predictions using the nomogram model of C1/C2 in the TCGA-GBM dataset. The dashed diagonal line indicates the no-discrimination reference (random classifier; AUC = 0.5). (E,F) TCGA-GBM patients in the cohort at different risks stratified according to the nomogram of C1/C2 subtype. In C1-GBM, the sample sizes were as follows: low-risk group (n = 8), middle-risk group (n = 47), and high-risk group (n = 39). In C2-GBM, the middle-risk group with one patient was merged into the low-risk group, resulting in sample sizes of low-risk group (n = 10) and high-risk group (n = 52). NA values were removed when calculating the cutoff values for low-, middle-, and high-risk groups in both C1-GBM and C2-GBM subtypes. This resulted in a reduction in the sample sizes allocated to each risk group.
![Cimb 48 00103 g007 Cimb 48 00103 g007]()
Figure 8.
Sensitive drug selection for immune-related C1 and tumor driver-related C2 subtypes. (A) Drug sensitivity (responsive) and resistance gene sets (non-responsive) for Pembrolizumab and anti-PD-1/PD-L1 from CTR-DB. The box plot depicts the enrichment scores for response prediction to anti-PD-1/PD-L1 therapy based on two gene expression datasets: GSE135222 and GSE78220. The groups are divided into responders (RPs) and non-responders (No-RPs) to the treatment. Statistical significance is denoted as *** p < 0.001 and ns indicates no significant difference. Red and blue boxes represent C1 and C2, respectively. The dash line separates the two datasets. (B) Evaluation of IC50 values and area under the curve (AUC) for drugs targeting LCK in GBM cell lines from the GDSC database. (C) Assessment of IC50 (filtered by maximum concentration) and AUC for drugs targeting AURKA, CDK1, and PLK in GBM cell lines using the GDSC database. (D) Ranking of Log2 FC of prognostic DEGs between C1 and C2 subtypes. The y-axis represents the Log2 FC of C1 vs. C2, with positive values indicating higher expression in C1 and negative values indicating higher expression in C2. Notably expressed genes include F13A1, CHI3L1, CD70, CXCL13, and FCGR2B, upregulated in C1, and VGF and HOXA2, upregulated in C2. (E) The number of small-molecule drug interactions for C1 (red) and C2 (blue). The grey bars indicate the total number of drugs: 88 drugs unique to C1, 21 drugs common to both subtypes, and 17 drugs unique to C2. The left bar plot provides the total number of drugs associated with each subtype: 109 for C1 and 38 for C2. (F,G) Gene targets of small-molecule drugs in C1 and C2. (H) Presentation of IC50 values for GBM cell lines treated with Methotrexate, Cisplatin, and Cytarabine. (I) Distribution analysis of IC50 values by tissue type in GBM cell lines for Methotrexate, Cisplatin, and Cytarabine (IC50: 50% inhibitory concentration; max conc: maximum screening concentration in µM; AUC: area under the curve).
Figure 8.
Sensitive drug selection for immune-related C1 and tumor driver-related C2 subtypes. (A) Drug sensitivity (responsive) and resistance gene sets (non-responsive) for Pembrolizumab and anti-PD-1/PD-L1 from CTR-DB. The box plot depicts the enrichment scores for response prediction to anti-PD-1/PD-L1 therapy based on two gene expression datasets: GSE135222 and GSE78220. The groups are divided into responders (RPs) and non-responders (No-RPs) to the treatment. Statistical significance is denoted as *** p < 0.001 and ns indicates no significant difference. Red and blue boxes represent C1 and C2, respectively. The dash line separates the two datasets. (B) Evaluation of IC50 values and area under the curve (AUC) for drugs targeting LCK in GBM cell lines from the GDSC database. (C) Assessment of IC50 (filtered by maximum concentration) and AUC for drugs targeting AURKA, CDK1, and PLK in GBM cell lines using the GDSC database. (D) Ranking of Log2 FC of prognostic DEGs between C1 and C2 subtypes. The y-axis represents the Log2 FC of C1 vs. C2, with positive values indicating higher expression in C1 and negative values indicating higher expression in C2. Notably expressed genes include F13A1, CHI3L1, CD70, CXCL13, and FCGR2B, upregulated in C1, and VGF and HOXA2, upregulated in C2. (E) The number of small-molecule drug interactions for C1 (red) and C2 (blue). The grey bars indicate the total number of drugs: 88 drugs unique to C1, 21 drugs common to both subtypes, and 17 drugs unique to C2. The left bar plot provides the total number of drugs associated with each subtype: 109 for C1 and 38 for C2. (F,G) Gene targets of small-molecule drugs in C1 and C2. (H) Presentation of IC50 values for GBM cell lines treated with Methotrexate, Cisplatin, and Cytarabine. (I) Distribution analysis of IC50 values by tissue type in GBM cell lines for Methotrexate, Cisplatin, and Cytarabine (IC50: 50% inhibitory concentration; max conc: maximum screening concentration in µM; AUC: area under the curve).
![Cimb 48 00103 g008 Cimb 48 00103 g008]()
Table 1.
Clinical features of C1 and C2 subtypes.
Table 1.
Clinical features of C1 and C2 subtypes.
| Characteristics | C1 (n = 57) | C2 (n = 109) | Total (n = 166) | p-Value |
|---|
| OS time | | | | p < 0.001 |
| Mean ± SD | 352.88 ± 333.62 | 466.52 ± 406.59 | 427.50 ± 385.89 | |
| Median | 313.00 | 425.00 | 360.00 | |
| Age | | | | |
| Mean ± SD | 62.19 ± 12.15 | 58.71 ± 13.41 | 59.94 ± 13.04 | ns |
| Median | 62.77 | 60.01 | 60.80 | |
| Gender | | | | ns |
| Male | 37 (65%) | 70 (64%) | 107 | |
| Female | 20 (35%) | 39 (36%) | 59 | |
| GeneExp Subtype | | | | p < 0.001 |
| Classical | 4 (7%) | 37 (34%) | 41 | |
| Mesenchymal | 35 (61%) | 21 (19%) | 56 | |
| Neural | 15 (26%) | 13 (12%) | 28 | |
| Proneural | 1 (2%) | 37 (34%) | 38 | |
| DNA methylation | | | | p < 0.001 |
| Cluster 1 | 11 (19%) | 3 (3%) | 14 | |
| Cluster 2 | 13 (23%) | 16 (15%) | 29 | |
| Cluster 3 | 12 (21%) | 20 (18%) | 32 | |
| Cluster 4 | 1 (2%) | 24 (22%) | 25 | |
| Cluster 5 | 0 | 8 (7%) | 8 | |
| Cluster 6 | 0 | 13 (12%) | 13 | |
Table 2.
The cell line IC50 values of GBM cell lines for Methotrexate, Cisplatin, and Cytarabine.
Table 2.
The cell line IC50 values of GBM cell lines for Methotrexate, Cisplatin, and Cytarabine.
| Cell Line | TCGA Tumor | IC50 a (Filtered by Max Conc b) | AUC c | Drugs |
|---|
| 42-MG-BA | GBM | 0.02 | 0.44 | Methotrexate |
| SK-MG-1 | GBM | 0.05 | 0.54 | Methotrexate |
| SF268 | GBM | 0.10 | 0.63 | Methotrexate |
| LN-18 | GBM | 0.19 | 0.72 | Methotrexate |
| AM-38 | GBM | 0.19 | 0.71 | Methotrexate |
| LNZTA3WT4 | GBM | 0.22 | 0.73 | Methotrexate |
| DBTRG-05MG | GBM | 0.39 | 0.72 | Methotrexate |
| U251 | GBM | 0.84 | 0.84 | Methotrexate |
| GB-1 | GBM | 3.95 | 0.81 | Cisplatin |
| 42-MG-BA | GBM | 7.01 | 0.89 | Cisplatin |
| 8-MG-BA | GBM | 7.59 | 0.91 | Cisplatin |
| 8-MG-BA | GBM | 0.63 | 0.79 | Cytarabine |
| U-118-MG | GBM | 1.51 | 0.81 | Cytarabine |
| YH-13 | GBM | 1.76 | 0.85 | Cytarabine |
| AM-38 | GBM | 1.94 | 0.86 | Cytarabine |