Abstract
Colorectal cancer (CRC) is a major cause of cancer death, with the tumor microenvironment and gene expression influencing outcomes. Identifying survival-associated epithelial marker genes (EMGs) may improve prognosis and guide therapy. We obtained single-cell RNA-sequencing (scRNA-seq) data from CRC patients (n = 23,176 cells) from the TISCH database to identify EMGs through differential expression analysis. These were intersected with malignant cell markers. We used bulk RNA-seq data from TCGA-COAD (n = 375) to assess EMG prognostic value via univariable Cox analysis, followed by LASSO regression. Significant genes were evaluated using multivariable Cox models. An EMGs-based risk score was developed and validated using GSE39582 (n = 585) and GSE17536 (n = 177). Immune infiltration was assessed using xCELL and TIMER algorithms. A total of 107 EMGs were identified and assessed in TCGA data. Cox analysis identified 18 survival-related EMGs, which were narrowed by LASSO to SPINK1 and TIMP1. Multivariable analysis confirmed SPINK1 (HR: 0.88, 95% CI: 0.79–0.97, p = 0.009) and TIMP1 (HR: 1.66, 95% CI: 1.29–2.13, p < 0.001) as independent survival predictors. Patients were classified into high- (n = 187) and low-risk (n = 188) groups. The low-risk group had significantly better overall and disease-free survival. Immune profiling revealed distinct patterns, where the high-risk group showed higher dendritic cells, memory T-cells, macrophages, and immune checkpoint expression, while the low-risk group showed enrichment of NK cells, plasma cells, and CD4+ T-helper cells. These findings were validated in the GSE39582 and GSE17536 cohorts. EMGs have prognostic value in CRC, with SPINK1 and TIMP1 as independent survival predictors. Distinct immune patterns support integrating EMGs with immune profiling for improved risk stratification and personalized treatment.