1. Introduction
Globally, gastric cancer (GC) ranks fifth in incidence among all malignancies and is the third most common cause of cancer-related death [
1]. Although the five-year survival rate of GC has gradually improved over recent decades, it remains below 30% [
2]. Most patients present with advanced-stage disease, where available treatment options have limited efficacy against metastasis, resulting in poor clinical outcomes. Therefore, a comprehensive understanding of the molecular and genetic mechanisms underlying GC is essential for elucidating its pathogenesis, identifying novel therapeutic targets, and ultimately improving patient prognosis.
The tumor microenvironment (TME) constitutes the internal milieu that supports tumor cell survival and progression. TME-related factors, particularly hypoxia, are closely associated with tumor growth, therapeutic resistance, and metastasis [
3]. Accumulating evidence indicates that hypoxic microenvironments are a common feature of solid tumors, with hypoxia-related genes frequently enriched in malignancies such as head and neck cancer, lung cancer, and cervical squamous cell carcinoma [
4,
5]. In recent years, hypoxia has been increasingly recognized as a critical contributor to the initiation and progression of GC [
6,
7]. However, the role of hypoxia-related genes in specific GC cell subtypes and their prognostic value in GC patients remains not fully elucidated. With the rapid development of public genomic databases, it has become increasingly feasible to systematically identify novel hypoxia-associated biomarkers in GC.
Single-cell RNA sequencing (scRNA-seq) enables the reverse transcription, amplification, and sequencing of the complete transcriptome of individual cells, followed by comprehensive bioinformatic analysis [
8]. This technology has been widely applied in cancer research to resolve intratumoral heterogeneity at single-cell resolution. GC exhibits substantial structural and cellular complexity, with malignant cells embedded within a diverse TME composed of epithelial cells, fibroblasts, endothelial cells, and immune cells [
9]. The scRNA-seq enables high-resolution analysis of gene expression, allowing accurate identification of diverse cell populations and delineation of intratumoral heterogeneity, thereby facilitating a deeper understanding of how GC heterogeneity influences tumor progression and clinical outcomes. Importantly, scRNA-seq further enables the identification of hypoxic cell subpopulations and hypoxia-associated transcriptional programs within tumor tissues, providing a powerful approach to dissect hypoxia-driven cellular heterogeneity in GC [
10]. Although several hypoxia-based prognostic signatures for GC have been proposed using bulk transcriptomic data and integrated single-cell analyses, the cellular origins and biological contexts underlying these signatures remain incompletely characterized [
11,
12,
13].
Weighted gene co-expression network analysis (WGCNA) provides a network-based strategy to identify groups of genes with similar expression profiles across multiple samples. This approach clusters highly correlated genes into distinct modules and evaluates their associations with clinical traits or phenotypes, enabling the identification of key biomarker genes and potential therapeutic targets [
14]. In the context of hypoxia, WGCNA provides an effective framework to identify co-expressed hypoxia-associated gene modules that are correlated with hypoxia scores and clinically relevant traits, thereby facilitating the systematic prioritization of hypoxia-driven transcriptional programs. Previous studies have applied WGCNA to derive hypoxia-related prognostic models in GC, as well as to characterize gene modules associated with prognosis or molecular subtypes [
13,
15,
16]. However, many of these models rely on relatively large gene sets and lack validation at the single-cell level, limiting their clinical tractability and biological interpretability. In contrast to these studies, our analysis integrates ssGSEA-derived hypoxia scoring to guide module selection and further maps prioritized hypoxia-related modules to specific cellular compartments using single-cell transcriptomic data, thereby providing a cell-type-resolved interpretation of hypoxia-associated gene networks.
In the present study, we integrated ssGSEA-derived hypoxia scoring, WGCNA-based module prioritization, Cox, and LASSO modeling to systematically identify hypoxia-associated prognostic biomarkers in GC. We derived and validated a parsimonious four-gene panel (SPARC, AXL, NRP1, and VCAN) across independent cohorts and further mapped its expression to cancer-associated fibroblasts (CAFs) at single-cell resolution. By integrating WGCNA and single-cell transcriptomic analyses, our study provides a cell-type-resolved framework for hypoxia-associated prognostic modeling and offers novel insights into the biological basis of hypoxia-driven GC progression.
2. Materials and Methods
2.1. Data Processing
The GC transcriptomics dataset (GSE84437) and GC single-cell RNA sequencing dataset (GSE163558) were downloaded from the Gene Expression Omnibus (GEO) of the National Center for Biotechnology Information (NCBI) (
https://www.ncbi.nlm.nih.gov/geo/) (accessed on 2 February 2026). GSE84437 is a microarray dataset generated on the Illumina HumanHT-12 V3.0 expression beadchip platform (GPL6947). Raw files were processed using standard quantile normalization, followed by log2 transformation, to obtain normalized expression matrices for subsequent analyses. GSE163558 is a single-cell RNA sequencing dataset generated on the Illumina NovaSeq 6000 platform (GPL24676). In addition, the GC cohort from The Cancer Genome Atlas (TCGA) was obtained from the Genomic Data Commons portal (
https://portal.gdc.cancer.gov/) (accessed on 2 February 2026) in HTSeq-FPKM format. Genes with low expression (FPKM < 1) in more than 50% of samples were filtered out, and the remaining FPKM values were converted to transcripts per million (TPM), followed by log2(TPM + 1) transformation for subsequent analyses. After data normalization, samples were excluded if overall survival (OS) data were unavailable, key clinicopathological variables (including age or TNM stage) were missing, or the follow-up duration was less than 30 days. Only patients with complete survival data and essential clinical annotations were retained for subsequent analyses. The TCGA-GC cohort was used as the training set, whereas the independent GEO dataset GSE84437 served as an external validation cohort. Consequently, a total of 434 samples from GSE84437, 3 GC samples from GSE163558, and 380 samples from the TCGA dataset were included in the final analyses. The overall workflow of the study is illustrated in
Figure 1.
2.2. Single-Sample Gene Set Enrichment Analysis (ssGSEA)
Single-sample GSEA (ssGSEA) was applied via the GSVA R package (v1.48.3) to calculate hypoxia enrichment scores across GC samples, using the MSigDB HALLMARK_HYPOXIA gene set. According to the median ssGSEA score within each cohort (TCGA-GC and GSE84437), GC samples were stratified into high-hypoxia and low-hypoxia groups for subsequent analyses.
2.3. Weighted Co-Expression Network Analysis
WGCNA was performed on the TCGA-GC dataset using the WGCNA package in R to identify stable gene modules associated with hypoxia [
10]. In brief, the data were preprocessed by filtering out low-expression genes, and genes with CPM < 1 in more than 50% of samples were removed prior to network construction. The soft-thresholding power (β) was determined following the WGCNA standard procedure by analyzing the scale-free topology fit index across a series of candidate values. The smallest β value achieving approximate scale-free topology (R
2 > 0.8) with reasonable mean connectivity was chosen. Genes were subjected to clustering to build co-expression networks. These networks were divided into modules, with similar ones being merged. Module–trait relationships were assessed via Pearson correlation analysis between module eigengenes and the hypoxia ssGSEA score, yielding a correlation matrix. Modules characterized by the highest correlation magnitudes with the hypoxia phenotype were carried forward for further investigation.
2.4. Gene Enrichment Analysis
Functional enrichment analyses were performed on the hub genes identified from the hypoxia-associated module in the WGCNA analysis. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted using the “clusterProfiler” R package together with the “org.Hs.eg.db” annotation package. Functional terms and pathways were defined as significant when the adjusted FDR was less than 0.05.
2.5. Construction of the Prognostic Signature
First, the univariate Cox regression analysis was performed to identify hypoxia-related genes significantly associated with patient prognosis. Subsequently, the LASSO regression algorithm was applied to select the most informative genes and construct the optimal prognostic signature. Patients with GC were divided into high-risk and low-risk categories according to the median hypoxia score derived from the TCGA-GC training set.
2.6. Assessment of Prognostic Signature
The predictive accuracy of the hypoxia-related prognostic signature was evaluated using time-dependent receiver operating characteristic (ROC) curve analysis and corresponding area under the curve (AUC) values in both the TCGA-GC training cohort and the independent GEO validation cohort (GSE84437). In addition, clinicopathological features were compared between high- and low-risk groups using the chi-square test for categorical variables and the Wilcoxon rank-sum test for continuous variables. Prognostic determinants were examined through univariate and multivariate Cox models, with hazard ratios (HRs) and 95% confidence intervals (CIs) reported.
2.7. Construction of Nomogram
Nomograms were constructed based on the calculated risk scores using the “rms” R package. The nomogram integrated clinicopathological variables, including age, sex, N stage, M stage, and the hypoxia-related prognostic signature, to predict 1-, 3-, and 5-year OS in patients with GC. The predictive performance of the model was evaluated at 1, 3, and 5 years using time-dependent ROC curves implemented in the “timeROC” R package, and calibration analysis was conducted to compare predicted survival estimates with observed outcomes.
2.8. Immune Infiltration Analysis
Immune cell infiltration in GC samples from the GEO cohort (GSE84437) was evaluated to characterize the tumor immune microenvironment and the functional relevance of immune-related genes. The relative proportions of 22 infiltrating immune cell subsets were estimated using the CIBERSORT algorithm with the LM22 signature matrix and 1000 permutations. Only samples with CIBERSORT output p < 0.05 were retained for subsequent analyses to ensure reliable deconvolution results. The distribution of immune cell fractions across samples, as well as the correlations among different immune cell populations, were visualized using ggplot2 (version 3.4.4) and pheatmap (version 1.0.12) R packages. Immune infiltration patterns and immune checkpoint expression across risk groups were compared using a violin plot visualization.
2.9. Consensus Clustering Analysis of Hypoxia-Related Genes
Consensus clustering analysis was performed using the ConsensusClusterPlus R package to classify GC samples into distinct molecular subtypes based on the expression profiles of hypoxia-related genes. The number of clusters (k) was tested from 2 to 9 using the partitioning around medoids (PAM) method with 1 − Pearson correlation as the distance metric. The optimal number of clusters (k = 3) was selected based on the delta area plots and the stability of the consensus heatmaps. Batch effects were adjusted prior to clustering to minimize technical bia
2.10. Single-Cell RNA Sequencing Analysis
Raw sequencing data of the GSE163558 dataset were initially processed using Cell Ranger (version 6.1.2) for read alignment, barcode processing, and generation of the gene–cell count matrices. Downstream analyses were performed using the Seurat R package (version 4.3.0). Cells were filtered out if they expressed fewer than 400 genes or more than 5000 genes, or if the proportion of mitochondrial gene expression exceeded 10% or erythrocyte gene expression exceeded 3%. After quality control, data normalization, scaling, and identification of highly variable genes were conducted in Seurat, and genes with the highest variability (n = 1500) were carried forward for downstream analyses. Principal component analysis (PCA) was performed on the scaled data, and the top 20 principal components were used for downstream clustering. Louvain clustering (resolution 0.5) was used to define cell populations, which were visualized using t-SNE. Marker genes were identified for each cluster using FindAllMarkers with the thresholds of |log2 fold change| > 0.25, minimum percentage of expressing cells (min.pct) > 0.25, and adjusted
p value < 0.05 (
Supplementary File S1). Cell-type annotation was performed using the SingleR package with the Human Primary Cell Atlas reference dataset, resulting in the identification of 12 clusters that were further grouped into seven major cell types. The genes for each cluster were further analyzed and visualized using heatmaps, t-SNE plots, and bubble plots.
2.11. Pseudotime Analysis
CAFs were operationally defined as the stromal cluster annotated based on canonical fibroblast or ECM markers. All CAF-annotated cells retained after quality control were extracted for trajectory inference (n = 238). Pseudotime analysis was conducted using the Monocle R package, in which ordering genes were selected as highly variable genes or cluster marker genes within the CAF subset, followed by dimensionality reduction using the DDRTree method and cell ordering along the reconstructed trajectory.
2.12. Quantitative Real Time-PCR (qRT-PCR)
GC tissues and paired para-carcinoma tissues were obtained from five patients at the Endoscopy Center of Wuhan Union Hospital with written informed consent from all participants. The study protocol was approved by the Institutional Review Board of Tongji Medical College, Huazhong University of Science and Technology (IORG No. IORG0003571). Total RNA was extracted using TRIzol (Vazyme, Nanjing, China), and RNA integrity was assessed by spectrophotometry, with samples showing A260/A280 values between 1.8 and 2.0 selected for downstream analyses. For reverse transcription, 1 μg of total RNA was reverse-transcribed into cDNA using a commercial reverse transcription kit (Vazyme, China). Gene expression was quantified by qRT-PCR using SYBR Premix Ex Taq (Vazyme, China). Each 20 μL reaction contained SYBR Premix, gene-specific primers, diluted cDNA, and nuclease-free water. The amplification protocol included an initial denaturation at 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s and 60 °C for 30 s. Specificity was confirmed by melt curve analysis. The expression values were normalized to
GAPDH using the 2
−ΔΔCt approach. Primer sequences used in this study are listed in
Supplementary File S2.
2.13. Statistical Analysis
All statistical analyses were performed using R software (version 4.2.1) and GraphPad Prism (version 6.0; GraphPad Software, San Diego, CA, USA). Data normality was assessed before applying parametric tests. Differences between the two groups were evaluated using Student’s t-test for normally distributed variables and the Wilcoxon rank-sum test for non-normally distributed variables. For multiple comparisons in high-dimensional analyses, FDR correction was applied using the Benjamini–Hochberg method. OS was defined as the time from initial diagnosis to death or last follow-up. Survival analyses were conducted using the Kaplan–Meier (K–M) method, with group differences compared by the log-rank test. HRs and 95% CIs were derived from Cox regression models, and the proportional hazards assumption was verified via Schoenfeld residuals. The prognostic performance of the hypoxia-related signature was further evaluated using time-dependent ROC analysis and concordance indices. p value < 0.05 was considered statistically significant.
4. Discussion
GC, one of the most common malignant tumors worldwide, poses a substantial burden on public health and significantly reduces life expectancy. Rapid tumor cell proliferation combined with insufficient or delayed angiogenesis often leads to inadequate blood perfusion within tumor tissues, thereby promoting the formation of a hypoxic TME. In recent years, tumor hypoxia has emerged as a central focus in cancer research, given its pivotal involvement in cancer progression, metastasis, therapeutic resistance, prognosis, and treatment responsiveness [
17]. Despite its importance, hypoxia-related gene signatures for accurately predicting prognosis in GC remain poorly characterized. In recent years, several hypoxia-related gene signatures have been proposed for prognostic stratification in GC. However, their biological interpretability, cell-type specificity, and clinical tractability remain incompletely elucidated.
In the present study, hypoxia emerged as an independent prognostic risk factor in GC, with patients exhibiting higher hypoxia ssGSEA scores showing significantly reduced overall survival compared with those with lower scores. Using WGCNA, we identified gene modules most strongly associated with hypoxia and subsequently established a hypoxia-related prognostic gene signature through LASSO regression and multivariate Cox analysis. The stability of the prognostic signature was confirmed across both TCGA and GEO cohorts, with high-risk HYS patients consistently exhibiting inferior survival compared with those in the low-risk group. ROC analysis further supported the hypoxia-related gene signature as having a moderate clinically meaningful ability to predict prognosis in GC. Subgroup analyses revealed that the adverse prognostic impact of hypoxia was particularly evident across multiple clinicopathological strata, including patients aged ≤60 years, female patients, those with advanced disease (stage III–IV), higher T stage (T3–4), nodal involvement (N1–3), and non-metastatic status (M0). Moreover, AXL, NRP1, SPARC, and VCAN were significantly upregulated in GC tissues compared with paired para-carcinoma samples. Given that para-carcinoma tissues may not represent truly normal mucosa, additional validation using GTEx normal gastric tissues confirmed consistent overexpression of these genes in GC, supporting the robustness of our results. Collectively, these findings underscore the important role of hypoxia in GC progression and demonstrate the potential clinical relevance of hypoxia-associated molecular signatures in prognostic stratification.
The four genes comprising the hypoxia-related signature (
SPARC, AXL, VCAN, and
NRP1) have all been implicated in GC progression and tumor–stromal interactions.
SPARC, a stromal cell-associated glycoprotein, has been reported to inhibit GC cell proliferation by inducing apoptosis, and its downregulation is associated with improved prognosis in GC patients [
18,
19].
AXL, a receptor tyrosine kinase, plays a pivotal role in tumor–stromal interactions. Bae et al. demonstrated that inhibition of the
GAS6/AXL axis suppresses tumor progression by disrupting the crosstalk between GC-associated fibroblasts and cancer cells [
20]. As a major component of the extracellular matrix,
VCAN has emerged as a prognostically relevant molecule in GC and is implicated in promoting tumor growth and invasion [
21]. In addition,
NRP1, a multifunctional co-receptor, has been reported to facilitate GC growth and migration [
22]. In line with these reports, our single-cell analysis suggested that these genes are relatively enriched in CAFs, with
NRP1 also exhibiting notable expression in endothelial cells. These observations support the notion that hypoxia-associated transcriptional programs may be preferentially engaged within stromal components of the GC microenvironment. Pseudotime analysis further suggested gradual changes in the expression of hypoxia-related genes along inferred CAF state transitions, with
SPARC and
VCAN exhibiting a gradual decline, while
AXL and
NRP1 displayed a characteristic down–up–down expression trend.
A heterogeneous population of innate and adaptive immune cells within the TME is pivotal to the development and advancement of GC [
23]. Immune infiltration analysis in our study demonstrated that T cells constituted the dominant immune cell population within GC tissues. Previous studies have reported that the abundance of CD4
+ naive T cells, CD4
+ memory T cells, CD8
+ T cells, and activated CD8
+ T cells increases with tumor progression, highlighting the pivotal role of T cell-mediated immunity in cancer development [
24]. Notably, we further observed a significantly higher proportion of M2 macrophages in the high-risk HYS group. M2-polarized macrophages are known to exert pro-tumorigenic effects by promoting tumor growth, angiogenesis, and metastasis, thereby contributing to an immunosuppressive microenvironment and unfavorable clinical outcomes [
25].
Immune checkpoint molecules can suppress anti-tumor immune responses and facilitate immune evasion by tumors through their interactions with cognate ligands or receptors on cancer cells [
26,
27]. Recently, several key immune checkpoint molecules have been identified as promising immunotherapeutic targets, and immune checkpoint inhibitors have been widely implemented in clinical oncology. Our analyses revealed a close link between hypoxic status and immune checkpoint profiles, distinguishing the high-risk HYS group from the low-risk group. Notably,
PDCD1LG2 (
PD-L2), a critical immunosuppressive molecule that inhibits immune-mediated tumor cell killing, was significantly upregulated in the high-risk HYS group. Previous studies have shown that PD-1/PD-L2 interactions play a functional role in suppressing anti-tumor immune responses within the GC-TME [
28,
29]. These results provide a possible explanation for the adverse prognostic impact of heightened hypoxia in GC.
Some limitations should be considered when interpreting these findings. First, the qRT-PCR validation was exploratory and focused on a limited number of candidate genes with a small sample size. Therefore, multiple-testing correction was not applied. These results should be interpreted with caution and require confirmation in larger, independent clinical cohorts and additional control genes. Second, the single-cell RNA sequencing analysis was based on only three GC samples, which limits generalizability and precludes a robust assessment of inter-patient heterogeneity. Third, although the proposed hypoxia-related prognostic signature demonstrated statistically significant prognostic value, it was not directly benchmarked against previously published GC prognostic models or advanced machine learning-based survival algorithms. As a result, its relative performance compared with established gene signatures or nonlinear survival models remains to be determined. In addition, the gene selection strategy employed in this study, integrating WGCNA module membership, Cox regression, and LASSO shrinkage, primarily identifies co-expressed genes with linear associations to survival outcomes. While this framework enhances interpretability and clinical applicability, it does not explicitly model condition-specific differential expression or nonlinear feature importance and higher-order interactions. Future studies incorporating complementary differential expression analyses and machine learning-based survival modeling approaches may further refine feature prioritization and deepen mechanistic interpretation.
In conclusion, we systematically analyzed GC data from public databases to construct a prognostic signature of hypoxia-related genes. Using the median signature score, patients were stratified into high-risk HYS and low-risk HYS subgroups, with consistently poorer survival outcomes observed in patients with elevated HYS. The signature genes exhibited significant upregulation in GC tissues relative to para-carcinoma and normal gastric samples. Moreover, the hypoxia-related signature also correlated with distinct immune infiltration profiles and checkpoint expression, implying that immune suppression within the TME may contribute to the poor prognosis of high-risk HYS patients. In addition, our single-cell and pseudotime analyses indicated that hypoxia-related signature genes play important roles in the regulation of CAF development and differentiation in GC.
Collectively, these findings provide novel insights into the multifaceted role of hypoxia in GC progression. More importantly, our study provides additional evidence supporting the value of integrating bioinformatics analyses with immune landscape characterization, offering potential implications for prognostic stratification and the development of more effective immunotherapeutic strategies. Nevertheless, further validation using larger, independent cohorts and additional experimental studies is warranted to comprehensively elucidate the mechanisms linking hypoxia, immune regulation, and clinical outcomes in GC. Future work should focus on prospective validation in larger cohorts, systematic benchmarking against existing prognostic models, and mechanistic experiments to elucidate the roles of hypoxia-related genes in stromal–immune interactions and hypoxia-driven GC progression.