Next Article in Journal
Mapping the Intellectual Structure of Computational Risk Analytics in Banking and Finance: A Bibliometric and Thematic Evolution Study
Previous Article in Journal
First-Principles Insights into Mo and Chalcogen Dopant Positions in Anatase, TiO2
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Construction and Evaluation of a Domain-Related Risk Model for Prognosis Prediction in Colorectal Cancer

1
School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China
2
Inner Mongolia Key Laboratory of Life Health and Bioinformatics, Inner Mongolia University of Science and Technology, Baotou 014010, China
3
School of Economics and Management, Inner Mongolia University of Science and Technology, Baotou 014010, China
*
Author to whom correspondence should be addressed.
Computation 2025, 13(7), 171; https://doi.org/10.3390/computation13070171
Submission received: 12 June 2025 / Revised: 8 July 2025 / Accepted: 14 July 2025 / Published: 17 July 2025

Abstract

Background: Epigenomic instability accelerates mutations in tumor suppressor genes and oncogenes, contributing to malignant transformation. Histone modifications, particularly methylation and acetylation, significantly influence tumor biology, with chromo-, bromo-, and Tudor domain-containing proteins mediating these changes. This study investigates how genes encoding these domain-containing proteins affect colorectal cancer (CRC) prognosis. Methods: Using CRC data from the GSE39582 and TCGA datasets, we identified domain-related genes via GeneCards and developed a prognostic signature using LASSO-COX regression. Patients were classified into high- and low-risk groups, and comparisons were made across survival, clinical features, immune cell infiltration, immunotherapy responses, and drug sensitivity predictions. Single-cell analysis assessed gene expression in different cell subsets. Results: Four domain-related genes (AKAP1, ORC1, CHAF1A, and UHRF2) were identified as a prognostic signature. Validation confirmed their prognostic value, with significant differences in survival, clinical features, immune patterns, and immunotherapy responses between the high- and low-risk groups. Drug sensitivity analysis revealed top candidates for CRC treatment. Single-cell analysis showed varied expression of these genes across cell subsets. Conclusions: This study presents a novel prognostic signature based on domain-related genes that can predict CRC severity and offer insights into immune dynamics, providing a promising tool for personalized risk assessment in CRC.

1. Introduction

Tumors are abnormal masses of tissue that arise from uncontrolled cell proliferation. Depending on their growth characteristics, they are classified as either benign or malignant tumors. Benign tumors typically exhibit slow growth and are not prone to spreading. They generally cause minimal damage to the surrounding tissues and can often be effectively treated through surgical removal. Common examples of benign tumors include lipomas and uterine fibroids. On the other hand, malignant tumors, or cancers, grow aggressively and have the potential to invade and destroy surrounding healthy tissues. These tumors are metastatic, meaning they can spread to other parts of the body via the bloodstream or lymphatic system, intensifying the severity of the disease. As a result, malignant tumors present more complex treatment challenges and are associated with a significantly higher mortality rate [1].
There is mounting concern regarding the escalating incidence of cancer-related fatalities, specifically CRC, which now ranks as the fourth most lethal cancer globally. This rise can be attributed to multiple factors, such as an aging population, unhealthy dietary patterns, smoking, physical inactivity, and obesity. Both genetic susceptibility and environmental influences are critical in the development of CRC [2].
Tumor suppressor genes can be rendered inactive either by epigenetic changes or mutations in DNA sequences, ultimately resulting in the onset of cancer [3]. Epigenetic mechanisms such as posttranslational histone modifications, DNA methylation, and non-coding RNA regulation are critical in cancer initiation and progression. Dysregulation of these processes can promote abnormal cell growth and proliferation.
Current research extensively explores the impact of two histone modifications—acetylation and methylation—on CRC [4]. Histone acetylation, a reversible process, involves the addition or removal of acetyl groups by histone acetyltransferases (HATs) and deacetylases (HDACs). Histone methylation, on the other hand, involves the transfer of methyl groups to histone residues, mediated by histone methyltransferases (HMTs) using S-adenosylmethionine as a methyl donor, and is regulated by histone demethylases (HDMs).
Bromodomain-containing proteins have the ability to recognize acetylated histones. Each bromodomain consists of 110 amino acids and is characterized by four α helices: αZ, αA, αB, and αC. The loops between these helices, specifically the αZ-αA (ZA) and αB-αC (BC) loops, form a hydrophobic pocket responsible for detecting acetyllysine modifications resulting from histone acetylation. This binding can trigger various biological effects, such as chromatin remodeling, transcriptional activation, and the recruitment of other proteins involved in gene regulation. Bromodomain-containing proteins are classified into eight subclasses based on their structural similarities. These proteins play a crucial role in cancer development and are key targets for anticancer therapies. For instance, the bromodomain protein ATAD2 has emerged as a promising target in cancer treatment. Small-molecule inhibitors designed to target bromodomain proteins have shown promising results in preclinical studies and are currently under investigation in clinical trials [5].
Chromodomain proteins, characterized by their ability to bind methylated lysines in histone tails, have a dual role in cancer biology—both suppressing and promoting tumor growth. A chromodomain typically comprises around 40–70 amino acids and is highly conserved across evolution. Structurally, it consists of three β-folded segments and one α-helix segment at the C-terminus. Mutations in these proteins have been identified in colorectal tumors, suggesting their potential as therapeutic targets [6,7].
A Tudor domain, like the chromodomain, recognizes and binds to methylated histones. Named after the Drosophila Tudor gene, this domain consists of approximately 60 amino acids and forms a barrel-like structure having 4–5 antiparallel β-strands. Numerous proteins containing the Tudor domains regulate RNA metabolism and germ cell development. Examples include SGF29, Spindlin1, UHRF1, PHF1, and SHH1. These proteins are crucial in these processes, and their dysregulation is linked to cancer [8,9,10].
Proteins containing the malignant brain tumor (MBT) domain and the plant homeodomain (PHD) are capable of recognizing and binding to methylated histones. L3MBTL3, which contains the MBT domain, plays a critical role in regulating the stability of various carcinogenic proteins and key epigenetic regulatory factors, thereby influencing the onset and progression of cancer [11]. Furthermore, PHF5A, with its PHD finger domain, plays a pivotal regulatory role in the initiation and progression of various malignant tumors, including CRC [12]. Despite these findings, a comprehensive review of the existing literature revealed that research focusing on the correlation between two domains and CRC remains limited. Furthermore, during the data collection phase, we noted that genes encoding proteins possessing these domains are relatively rare, which ultimately resulted in their exclusion from the scope of this study.
We investigated the correlation between CRC prognosis and genes from the chromo-, bromo-, and Tudor domain-containing protein family (CBT DCPFGs). Our study involved gathering gene expression and clinical data from public databases. Through detailed analysis including differential expression and Cox and LASSO regressions, we pinpointed specific genes strongly linked to CRC prognosis. These identified genes were subsequently employed to construct robust risk assessment models aimed at predicting outcomes for CRC patients. We systematically investigated the role of CBT DCPFGs in CRC prognosis based on the domains of gene-encoded proteins. Our study offers a fresh perspective on the molecular underpinnings of CRC prognosis.

2. Materials & Methods

2.1. Data Sources

The GSE39582 dataset, sourced from the GEO database (https://www.ncbi.nlm.nih.gov/geo/, accessed on 13 October 2023), comprises mRNA expression profiles of 566 CRC samples and 19 normal colorectal tissue samples, each featuring 54,675 probe data points. Using the annotation information from the GPL570 platform in the “hgu133plus2.db” R package (version 3.13.0) [13], we converted the probe data into gene symbols. In cases where multiple probes mapped to the same gene symbol, we computed the mean expression level to represent that gene’s overall expression. This screening process yielded 20,824 genes for further analysis. Clinical characteristics associated with the samples, such as gender, age, overall survival (OS) time, survival status, pathological stage, AJCC-TNM stage, and mutation status for KRAS, BRAF, and TP53, were retrieved using the “GEOquery” R package (version 2.68.0) [14]. Additionally, RNA-Seq data of CRC were obtained from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov, accessed on 1 November 2023), which includes 517 samples, consisting of 476 CRC patients and 41 non-patients. This dataset covers expression data for 59,427 genes. To focus on genes having significant expression levels, we retained those with a mean expression level exceeding 1, resulting in a final selection of 18,806 genes for further analysis.
We investigated CBT DCPFGs by extracting relevant information from GeneCards (https://www.genecards.org, accessed on 28 October 2023). Using the search terms “bromo domain protein”, “chromo domain protein”, and “Tudor domain protein” on the website’s homepage, we identified 33, 44, and 59 relevant genes, respectively, totaling 136 genes (Supplementary Table S1).
To ensure uniformity in gene symbols across the different datasets, we standardized the gene symbols of CBT DCPFGs in both the GEO and TCGA expression matrices. For example, if a gene named “SMA4” appeared in the GSE39582 dataset, we mapped it to “SMN1” to align with the gene symbol used in the TCGA expression matrix.
Additionally, Mutation Annotation Format (MAF) data for 455 CRC samples were downloaded from the TCGA database. The data were derived from samples of multiple patients and included detailed information on small-scale single nucleotide variations as well as insertions and deletions. Furthermore, the mutation data were aligned with the gene expression data through matching sample IDs. The flowchart illustrating the data processing steps for these analyses is presented in Figure 1.

2.2. Differential Expression Analysis and Enrichment Analysis

Differences in gene expression in the GSE39582 dataset were analyzed using the “limma” R package (version 3.56.2) [15]. Differentially expressed genes were filtered based on |logFC| > 1.5 and padj < 0.05. Concurrently, differential expression analysis was conducted for the TCGA-derived expression data. We identified CBT DCPFGs showing differential expression in both datasets.
For consistent gene identification, gene symbols were converted to Entrez Gene IDs using the “org.Hs.eg.db” R package (version 3.17.0) [16]. Subsequently, we performed ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses for the differentially expressed CBT DCPFGs using the “clusterProfiler” R package (version 4.8.3) [17,18]. The results were visualized using the “ggplot2” R package (version 3.5.2) [19]. Notably, a padj threshold of less than 0.05 was applied across all the analyses.

2.3. Construction of a Domain-Related Risk Model

To construct a domain-specific risk model, we followed a series of steps. Initially, we integrated GEO expression data with corresponding clinical information using sample ID details to form a unified dataset for subsequent analysis. Our goal was to pinpoint differentially expressed CBT DCPFGs linked to prognosis. To accomplish this, we conducted univariate Cox regression analysis on the combined dataset comprising the expression matrix, survival status, and survival time. A gene was deemed prognostically significant if its hazard ratio (HR) differed from 1 and its p-value was below 0.05.
Following the identification of potential prognostic genes, we utilized least absolute shrinkage and selection operator (LASSO) regression analysis via the “glmnet” R package (version 4.1-8) to select the most informative genes from the pool of differentially expressed CBT DCPFGs [20]. The minimum lambda value for the model was determined via 10-fold cross-validation. Using this minimum lambda value, we then screened the variables and selected the most pertinent ones to construct the final domain-related risk model.
The Domain-Related Score (DRS) for all CRC samples was determined utilizing the following formula: D R S = i 1 Coefficient i   Expression i , wherein the “ C o e f f i c i e n t i ” denotes the regression coefficient derived from the LASSO regression analysis for each gene, while the “ Expression i ” signifies the expression values attributed to the genes within the domain-associated risk model.

2.4. Survival Analysis Based on Low- and High-Risk Score Categories

To investigate the impact of DRS on CRC prognosis, we conducted survival analysis. Initially, we used the median DRS as a threshold to categorize all the samples into high-risk and low-risk groups. This categorization enabled a comparison of survival outcomes between these groups.
To visualize the relationship between survival time and the expression levels of prognostic signature genes across the samples, we utilized the “ggrisk” R package (version 1.3) to create scatter plots [21]. These plots effectively illustrated the distribution of gene expression levels and corresponding survival times, facilitating an exploration of DRS effects on CRC prognosis.
Moreover, we employed the Kaplan–Meier method from the “survival” R package (version 3.8-3) to compare survival prognoses between the high-risk and low-risk CRC sample groups [22,23]. This method enabled estimation of survival probabilities over time and identification of any significant differences between the groups.
To assess the predictive accuracy of domain-related risk models for OS, we employed the “survivalROC” R package (version 1.0.3.1) to generate receiver operating characteristic (ROC) curves [24]. These curves visually depicted the sensitivity and specificity of survival predictions, with accuracy evaluated based on the area under the curve (AUC).
Additionally, using the “maftools” R package (version 2.16.0) [25], we explored differences in gene mutations between the high-risk and low-risk groups identified by the domain-related risk model.

2.5. Correlation of DRS and the Clinical Features

To explore the relationship between DRS and clinical characteristics, we analyzed variations in DRS across different age groups, gender categories, AJCC-TNM stages, and pathological stages. This approach enabled us to determine if DRS correlated with specific clinical features, shedding light on the potential clinical implications of the domain-related risk model.

2.6. Comparison of Immune-Related Characteristics Between High-Risk and Low-Risk Score Groups

We investigated the differences in immune-related characteristics between the high-risk and low-risk score groups based on three immune-related gene sets: antigen presentation, immune checkpoint, and immune activation [26,27]. We assessed the estimate score, immune score, stromal score, and tumor purity of each CRC sample using the “estimate” R package (version 1.0.13) [28]. Additionally, we downloaded reference gene sets for 28 types of human immune cells from the TISIDB database (http://cis.hku.hk/TISIDB/ accessed on 21 March 2024). These gene sets were then used to analyze the variations in immune cell composition between the low-risk and high-risk score groups.

2.7. The Impact of DRS on Anti-PD1/PD-L1 Immunotherapy

We used the Tumor Immune Dysfunction and Exclusion (TIDE) method to assess tumor immune evasion and predict immunotherapy outcomes [29]. Samples having high TIDE values, indicating the presence of suppressive cells hindering T cell infiltration, were classified as non-responders. Gene expression and clinical data from the IMvigor210 cohort were obtained using the “IMvigor210CoreBiologies” R package (version 1.0.0) [30]. Applying the domain-related risk model, we computed the DRS for each IMvigor210 sample and categorized them into high-risk and low-risk groups based on the median DRS values. Additionally, immune response phenotypes were identified for each sample using clinical data. We then compared DRS values between the high-risk and low-risk groups across three tumor immune response phenotypes: immune-inflamed, immune-excluded, and immune-desert.

2.8. Predicting Potential Drug for CRC

We utilized the “oncoPredict” R package (version 1.2) to predict drug sensitivity and response [31,32]. This software leverages the Genomics of Drug Sensitivity in Cancer (GDSC) database to predict the effects of 198 drugs on different CRC samples. By employing this package, we aimed to identify potential drugs that exhibit susceptibility in treating CRC.

2.9. Single-Cell RNA Sequencing (scRNA-Seq) Analysis

We retrieved scRNA-seq data from GSE132257 in the GEO database and initially processed it using the “Seurat” R package (version 5.2.1) for filtration and standardization to ensure quality and consistency [33,34,35,36,37]. Subsequently, genes showing significant variance were selected for further analysis. Principal Component Analysis (PCA) was applied to reduce dataset dimensionality, followed by t-distributed stochastic neighbor embedding (t-SNE) to categorize cells into distinct clusters based on gene expression profiles. Cell types within each cluster were identified and annotated using the “SingleR” and “celldex” R packages (version 2.2.0 and 1.10.1) [38], leveraging data from the HumanPrimaryCellAtlasData. To assess gene activity in specific cellular domains, we utilized the “AUCell” R package (version 1.22.0) to calculate the AUC for each cell relative to genes in the domain-related risk model [39,40]. Finally, the AUC values were mapped back to their respective cells, where cells expressing more genes from the domain-related risk model showed higher AUC values compared to those expressing fewer genes.

2.10. Statistical Analysis

The t-test was used to assess whether there was a significant difference between the means of two samples, taking into account their sample means, standard deviations, and sizes. The Kruskal–Wallis test was employed to detect significant differences among three or more independent groups. The Kaplan–Meier method was applied to analyze survival differences between the high-risk and low-risk score groups. Both univariate and multivariate Cox regression analyses were performed to identify independent prognostic factors in CRC. ROC curves were utilized to evaluate the prognostic accuracy of each indicator.
All the analyses were conducted using R software (version 4.3.0). The graphs originating from the TCGA cohort dataset are marked with “TCGA cohort” at the top, whereas unlabeled graphs are derived from the GSE39582 cohort, with the exception of graphs displaying results from single-cell analysis.

3. Results

3.1. Differential Expression Analysis and GO and KEGG Pathway Enrichment Analysis

After using the “limma” R package (version 3.56.2) for analysis, we identified 93 differentially expressed CBT DCPFGs in the GSE39582 dataset (Figure 2A) and 79 in the TCGA dataset (Figure 2B). A Venn diagram analysis revealed 77 common CBT DCPFGs across both datasets, with 69 genes upregulated and 8 downregulated (Figure 2C,D).
To explore the biological functions influenced by these CBT DCPFGs, we conducted GO functional enrichment and KEGG pathway enrichment analyses on these 77 genes. Understanding these functions is crucial for deciphering CRC pathogenesis. The GO enrichment analysis highlighted biological functions related to chromatin remodeling and protein macromolecule methylation (Figure 2E). In contrast, the KEGG pathway enrichment analysis indicated associations with ATP-dependent chromatin remodeling, the polycomb repressive complex, hepatocellular carcinoma, lysine degradation, spliceosome, thermogenesis, and cell cycle processes (Figure 2F). These findings suggest that CBT DCPFGs likely play a pivotal role in epigenetic alterations within tumor cells, influencing chromatin remodeling, cell cycle dynamics, and, ultimately, tumor proliferation.

3.2. Constructing a Domain-Related Risk Model

In the GEO cohort, both the univariate and multivariate Cox regression analyses demonstrated that DRS could serve as an independent prognostic marker (Figure 3A,B). Subsequently, we developed a risk assessment model using CBT DCPFGs to predict CRC prognosis. After integrating the GEO expression data with corresponding clinical information, we conducted a univariate Cox regression analysis and identified 34 genes significantly associated with CRC prognosis among the 77 differentially expressed CBT DCPFGs. To pinpoint non-redundant genes for prognosis prediction, we employed LASSO regression analysis on these 34 genes (Figure 3C,D).
Using the minimum lambda value (lambda = 0.03744), we established a domain-specific risk model comprising four genes: AKAP1 (A-kinase anchoring protein (1), ORC1 (origin recognition complex subunit 1), CHAF1A (chromatin assembly factor 1 subunit A), and UHRF2 (ubiquitin-like with PHD and ring finger domains) (2). The formula for calculating DRS was as follows: DRS   =   0 . 24085 × AKAP1   0 . 08576 × ORC1 0 . 08269 × CHAF1A + 0 . 17753 × UHRF2 . Among these genes, UHRF2 was identified as a significant high-risk factor for the survival of CRC patients, with an HR of 26.018 (95% CI: 1.711–395.574, p = 0.019). Due to the large range of the HR values, we applied a log10 transformation to the HR and 95% CI values for better visualization in the forest plot. In contrast, AKAP1, ORC1, and CHAF1A were found to be associated with a lower risk in CRC, as indicated by HR values of less than 1 and p-values below 0.05 (Figure 3E).
In Figure 3D, 41 out of 455 CRC samples (9.01%) showed mutations in the four genes. Among these mutations, ORC1 had the highest frequency, while APAK1 had the lowest. All four genes exhibited missense mutations, with ORC1 and CHAF1A also displaying a small percentage of nonsense and multi-hit mutations (Figure 3F).

3.3. Survival Analysis of Low- and High-Risk Score Groups

The GEO cohort consists of 560 samples with complete survival data. Using the median DRS value (−2.000846), we categorized the samples into high-risk or low-risk groups. Among the high-risk group, 110 patients died, whereas in the low-risk group, there were 84 deaths. UHRF2 expression was notably higher in the high-risk group and lower in the low-risk group, while the other three genes showed an opposite trend (Figure 4A).
In the GEO cohort, the survival analysis indicated significantly poorer OS in the high-risk CRC samples (Figure 4B). Time-dependent ROC curves were then generated to assess the predictive capability of DRS for OS, yielding AUC values of 0.730, 0.634, and 0.604 at different time points (Figure 4C), demonstrating the DRS’s predictive potential for OS. Consistent results were also observed in the survival analysis of the TCGA cohort (Figure S1), with the corresponding time-dependent ROC curves shown in Figure S2.
Figure 4D,E illustrate genetic mutations observed in 223 high-risk CRC samples and 228 low-risk CRC samples, respectively. APC had the highest mutation frequency in both groups, primarily consisting of nonsense and multi-hit mutations. Notably, multi-hit mutations were slightly more frequent, while nonsense mutations were slightly less frequent across both sample sets.

3.4. The Association of DRS and the Clinical Features

The distribution of DRS did not vary significantly by age or gender (Figure 5A,B). However, DRS levels correlated with tumor progression in the CRC samples, with higher levels observed in patients with advanced tumors compared to those with early-stage tumors (Figure 5C).
Additionally, higher AJCC-TNM stages were consistently associated with elevated DRS in the CRC samples (Figure 5D–F). Conversely, there was no notable difference in DRS between the proficient MMR (pMMR) subgroup and deficient MMR (dMMR) subgroup (Figure 5G). Examining specific gene mutations revealed that the CRC samples harboring KRAS and PRAF mutations exhibited higher DRS values (p = 0.015 and 0.029, respectively; Figure 5H,I). In contrast, the CRC samples having TP53 mutations showed lower DRS levels (p = 0.016; Figure 5J). Lastly, proximal CRC demonstrated higher DRS levels compared to distal CRC (Figure 5K).

3.5. Comparison of Immune-Related Characteristics in CRC Samples with Different DRS

In Figure 6A, our analysis revealed differences in the expression of immune-related genes in the CRC samples of the GSE39582 cohort. Specifically, immune checkpoint genes showed higher expression in the low-risk score group compared to the high-risk score group, while the latter exhibited higher expression levels of antigen presentation and immune activation-related genes.
Comparing the high-risk score group with the low-risk score group, we found significantly higher estimate scores, immune scores, and stromal scores (Figure 6B–D), indicating a higher proportion of stromal and immune cells in the tumor microenvironment of the high-risk score group. Conversely, the low-risk score group showed significantly higher tumor purity (Figure 6E), likely due to fewer mesenchymal and immune cells. Similar analyses using the TCGA cohort dataset yielded outcomes consistent with those from the GSE39582 cohort (Figure S3A–D).
Evaluation of immune cell infiltration in the GEO cohort revealed significant differences between the high-risk and low-risk score groups (Figure 6F). While no significant differences were observed in the expression levels of activated CD8+ T cells, CD4+ T cells, T helper 17 cells, memory B cells, and monocytes between the groups, CD56dim natural killer cell expression was notably lower in the high-risk samples compared to the low-risk samples. The other 22 immune-related cells exhibited significantly higher expression in the high-risk samples. These results suggest that CRC samples having increased immune cell presence may indicate heightened inflammation and higher tumor risk.
Overall, these findings highlight distinct immune-related characteristics between the high-risk and low-risk score groups, encompassing differential gene expression patterns, variations in tumor microenvironment composition, and differences in tumor purity.

3.6. The Role of DRS in Anti-PD1/PD-L1 Immunotherapy

We analyzed the differential expression levels of five immune checkpoint genes in Figure 7A. The genes showed significant upregulation in the low-risk group, suggesting a favorable response to immunotherapy. Additionally, Figure 7B reveals a positive correlation between TIDE and DRS. High TIDE values in samples indicated non-responsiveness to treatment. Figure 7C highlights discernible differences in DRS values between non-responders and responders. The DRS score emerged as a promising predictive indicator for immunotherapy in CRC. Variations in immune subtypes led to varied treatment responses. Patients with a “desert” immune phenotype exhibited notably higher DRS scores compared to those with “excluded” and “inflamed” phenotypes (Figure 7D). These findings underscore the potential of DRS scores in predicting immunotherapy responses. In conclusion, these results validate DRS as a predictive tool for assessing immunotherapy outcomes.

3.7. Prediction of Potential Drug for CRC

Next, we predicted drug responses based on actual sensitivities. According to the GDSC database, Dactinomycin, Docetaxel, CDK 9 inhibitors, Bortezomib, and Camptothecin emerged as the top five potential drugs for CRC treatment. Figure 8 illustrates significant variations in the sensitivities of two potential drugs (OSI-027 and BMS-754807) between the high-risk and low-risk groups.

3.8. Investigating the Domain-Related Risk Model at the Single-Cell Level

After refining and normalizing CRC sample data from the GSE132257 dataset, our study categorized cells based on the expression profiles of the top 3000 variable genes (Figure S4A). These genes underwent dimensional reduction via PCA, condensing them into 14 subgroups from 0 to 13 (Figure S4B), and their expression profiles of domain-specific genes are shown in Figure S4C. We then annotated each cluster, identifying predominant cell types such as B cells, common myeloid progenitor (CMP) cells, endothelial cells, epithelial cells, monocytes, T cells, and tissue stem cells (Figure 9A). Figure 9B details the expression levels and distribution percentages of domain-specific genes across various cell subsets.
Additionally, employing the AUCell scoring algorithm, we categorized all the cells into low-AUC and high-AUC categories, with a threshold set at 0.038 revealing 587 cells with notably high AUC values (Figure S4D). Figure S4E provides the individual AUC values for each cell. To further explore the distribution of immune cell subtypes within these groups, detailed annotations were provided, as illustrated in Figure 9C. Figure 9D highlights the differences in proportions of immune cell subtypes between the low-AUC and high-AUC groups ( p < 0.044 ). Notably, the low-AUC group showed a higher proportion of B cells and T cells compared to the high-AUC group, suggesting a potential correlation where samples with low DRS scores might exhibit increased sensitivity to anti-PD1 immunotherapy.

4. Discussion

Our comprehension of the link between histone modification and CRC is increasingly thorough. Histone modifications play a crucial role in activating oncogenes and repressing tumor suppressor genes, thereby significantly contributing to CRC development. Among these modifications, acetylation and methylation have shown the most profound impact. For instance, modifications like H3K9ac, H3K18ac, and H3K4me2 have been extensively scrutinized in CRC [41]. These modifications exert their effects through regulatory proteins that utilize specialized domains such as bromodomains, chromodomains, and Tudor domains to bind histone modifications and regulate cellular functions. This intricate regulation ultimately influences CRC progression. Hence, alterations in the regulatory patterns of proteins containing these domains are pivotal in CRC development. Investigating CRC-specific prognostic markers from a domain perspective is essential to improve prognosis and outcomes for CRC patients.
In this study, we developed a risk assessment model using four CBT DCPFGs after conducting a comprehensive analysis of differentially expressed genes associated with CRC. These CBT DCPFGs primarily influence various signaling pathways, including ATP-dependent chromatin remodeling, the polycomb repressive complex, spliceosome function, and the cell cycle, all of which play significant roles in CRC development.
To validate our model’s reliability, we categorized the samples into high-risk and low-risk groups based on median risk scores and assessed patient prognoses within each group. The findings revealed significantly reduced survival times for the high-risk group patients compared to their low-risk counterparts. Furthermore, ROC curve analysis demonstrated the model’s ability to accurately predict prognosis.
Additionally, we investigated the correlation between DRS and clinical pathological features, revealing that patients with advanced CRC exhibit higher risk scores. This finding further reinforces the predictive potential of our risk assessment model, suggesting that these four genes play a crucial role in the development and prognosis of CRC. Building on these insights, the DRS presents considerable value for advancing research in this area.
The DRS assigns a weight to each gene by integrating the effects of both risk and protective genes. These weights reflect the strength of the association between specific genes and CRC, while also accounting for potential interactions between high-risk and low-risk genes. By offering a comprehensive risk assessment, the DRS not only aids in predicting the likelihood of CRC progression but also helps recommend appropriate preventive measures or treatment strategies, thereby providing indispensable diagnostic tools for clinical decision-making.
Moreover, the DRS allows clinicians to monitor changes in patients’ genetic risk scores over time, offering critical data to support the early detection of and timely intervention in CRC. This facilitates more precise medication choices, reducing the risk of adverse drug reactions while optimizing treatment outcomes. In this manner, the DRS delivers essential scientific evidence and technical support for the prevention, early diagnosis, and personalized treatment of CRC in clinical settings, ultimately improving patient care and clinical efficacy.
To successfully integrate the DRS model into clinical workflows, it is crucial to foster interdisciplinary collaboration, ensuring a seamless convergence of genetics, data science, and clinical medicine. Furthermore, it is essential to guarantee that the model can be efficiently incorporated into existing electronic medical records and clinical decision support systems, facilitating real-time risk monitoring and timely intervention. This integration will enable healthcare providers to make more informed, data-driven decisions, ultimately improving patient outcomes.
In our constructed prognostic risk model, we identified a high-risk factor associated with CRC prognosis related to the UHRF2 gene. Previous studies have consistently highlighted UHRF2’s strong association with CRC development. For instance, Li et al.’s research revealed UHRF2’s predominant expression in intestinal crypts and adenomas, regulated by the Wnt signaling pathway. Depletion of UHRF2 impedes tumor initiation and progression, notably suppressing primary tumor organoid formation. Moreover, Wnt signaling-induced UHRF2 expression enhances tumorigenesis by stabilizing Tcf4, a crucial intranuclear effector in the Wnt pathway [42]. Additionally, qRT-PCR results underscored a marked elevation in UHRF2 expression within CRC tissues [43]. This emphasizes UHRF2’s pivotal role in CRC progression, linked intimately to its unique structural features. UHRF2 shares structural similarities with its paralog, UHRF1, and both are prominently localized in H3K9me3-rich pericentric heterochromatin, indicating shared binding sites [44,45].
UHRF1 features interconnected histone reader modules, specifically a tandem Tudor domain (TTD) and a PHD finger. The PHD finger and TTD individually recognize methylation states at H3R2 and H3K9 in the H3 tail. The two Tudor domains of UHRF1 interlock tightly, utilizing their initial two β-strands. The first Tudor domain forms an aromatic “cage” composed of the amino acid residues F152, Y188, and Y191, accommodating the H3K9me3 side chain [6]. Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC) analysis has validated UHRF1’s preference for nucleosomes containing H3K9me3 [46]. The synergistic action of the TTD and PHD finger enhances binding to the histone H3K9me3 tail. The tandem Tudor domain of UHRF1 exhibits a robust affinity for H3K9me3 binding, crucial for recruiting DNMT1 to heterochromatic regions and thereby promoting DNA methylation.
Research by Lu et al. also suggests that UHRF2 could contribute to colon cancer progression and serve as a new prognostic indicator post-treatment [47]. These findings collectively underscore UHRF2’s significant role in CRC and its potential implications for patient prognosis.
The ORC consists of six protein subunits, with ORC1 serving as a core component that plays a crucial role in recognizing DNA replication origins and initiating the replication process. ORC1 collaborates with other replication factors, including CDC6 and CDT1, to form the pre-replication complex, ensuring both the timely and accurate execution of DNA replication. During the G1 phase of the cell cycle, ORC1 is recruited along with other ORC subunits to the replication origin, setting the stage for DNA replication in the subsequent S phase.
In addition to its role in DNA replication, ORC1 interacts closely with several key cell cycle-related proteins, such as CDC6, SMC4, and GINS2. It works in conjunction with CDC6 to regulate the initiation of DNA replication. Notably, CDC6 itself is considered a marker for CRC, and its interaction with ORC1 may influence the development of CRC.
ORC1 engages with various tumor-associated signaling pathways, including the Wnt, MAPK, and JAK-STAT pathways. ORC1 may influence CRC progression through its interaction with genes such as TP53 and RBX1. In the JAK-STAT pathway, ORC1 can modulate the proliferation and survival of cancer cells by interacting with genes like CDKN1A.
Furthermore, ORC1 modulates cancer cell proliferation by regulating cell cycle progression and DNA replication signaling. It is also involved in chromatin remodeling, which affects the transcriptional accessibility of genes. In the context of CRC, ORC1 shows a positive correlation with several histone-modifying enzymes, such as HAT1 and POLE3, which play pivotal roles in cancer development. Through its chromatin remodeling function, ORC1 may regulate the expression of genes critical for cancer cell survival, thereby influencing tumorigenesis and cancer progression.
In addition to its direct effects on tumor cell proliferation, ORC1 may also impact cancer progression by modulating the tumor microenvironment. Survival analyses of CRC patients have shown that the expression level of ORC1 is closely associated with patient prognosis [48,49]. This correlation suggests that ORC1 expression could serve as a potential prognostic biomarker, with promising applications in clinical settings, particularly in personalized therapy and immunotherapy.
The CHAF1A gene, located at position p13.3 on chromosome 19, encodes the major functional subunit of the CAF-1 protein, which plays a crucial role in DNA replication, gene expression regulation, and DNA mismatch repair. The CHAF1A protein, composed of 938 amino acids, interacts with several key components involved in chromatin dynamics. Its C-terminal region binds to the p60 subunit of CAF-1, facilitating nucleosome assembly, while its N-terminal region interacts with the proliferating cell nuclear antigen (PCNA) and heterochromatin protein 1 (HP1), thereby enhancing DNA polymerase activity during DNA replication. In addition to its structural functions, CHAF1A expression is strongly correlated with clinical prognosis in CRC, independent of factors such as patient age, cancer stage, and tumor type. The protein’s KER and ED domains, together with two internal partial domains, serve as binding sites for the acetylated histones H3 and H4. These interactions are crucial for the regulation of the cell cycle, further emphasizing CHAF1A’s pivotal role in maintaining cellular function and its potential as a prognostic indicator in colon cancer [50,51].
AKAP1 (also known as AKAP149) plays a pivotal role in numerous essential cellular processes that are intricately linked to the regulation of apoptosis and cell survival—both of which are critical factors in the progression and prognosis of CRC. These functions suggest that AKAP1 holds significant potential as a prognostic marker for CRC. Below are the key roles of AKAP1 that underline its importance in CRC.
First, AKAP1 is involved in the regulation of protein kinase A (PKA) activity. It plays a crucial role in the TGFβ/PKA-mediated tumor-suppressor signaling pathway. What sets AKAP1 apart is its unique ability to bind to PKA in a cAMP-independent manner, allowing it to regulate PKA activity regardless of cAMP levels. This distinctive feature enables AKAP1 to influence cellular responses to external stimuli, even when cellular signaling pathways are disrupted, as often occurs in CRC. AKAP1’s ability to sustain tumor-suppressive signals could, therefore, impact tumor growth and its response to treatment, making it a potential prognostic marker for patient outcomes.
In addition, AKAP1 mediates the pro-apoptotic effects of OSI-906, an IGF1R kinase inhibitor. The effectiveness of OSI-906 in inducing apoptosis is critically dependent on the presence and function of AKAP1, particularly in CRC. AKAP1 facilitates the activation of OSI-906 and modulates PKA-mediated apoptosis, highlighting its essential role in apoptosis regulation. Consequently, the expression and activity of AKAP1 may significantly influence the efficacy of apoptosis-inducing therapies in CRC, positioning it as a valuable marker for predicting patient responses to treatment interventions.
Furthermore, AKAP1 impacts cell survival signals by acting as a key modulator. The activation of PKA by OSI-906, which is regulated by AKAP1, plays a crucial role in determining cell fate. When TGFβ signaling is inhibited, the PKA activation induced by OSI-906 is blocked, emphasizing AKAP1’s indispensable role in controlling cell survival in CRC. Since tumors that evade apoptosis and continue to signal for survival generally exhibit a poorer prognosis, AKAP1’s regulation of these survival pathways suggests that its levels and activity could serve as an important indicator of tumor regression or survival. As such, AKAP1 could be a valuable marker for evaluating CRC prognosis and predicting treatment responses.
Lastly, AKAP1 regulates the interaction between XIAP (X-linked inhibitor of apoptosis protein) and survivin, two crucial proteins involved in apoptosis regulation. By modulating this interaction, AKAP1 influences the delicate balance between pro-survival and pro-apoptotic signals within the cell. This regulatory function underscores AKAP1’s importance in maintaining cellular homeostasis, particularly in CRC, where the balance between cell death and survival signals is often disrupted (Refs. [52,53]). Together, these functions of AKAP1 highlight its potential as a critical biomarker for assessing CRC prognosis.
We have identified four key genes strongly associated with the prognosis of CRC patients. These genes encode proteins with specific domains that bind to histone modifications, shedding light on their crucial role in cancer progression. This discovery opens new avenues for future research focused on these genes and their encoded histone modification-binding proteins. Such research could lay the foundation for the development of comprehensive treatment strategies that target epigenetic mechanisms and associated signaling pathways, ultimately leading to more effective and tailored therapeutic outcomes.
Many studies on the prognosis of CRC primarily focus on traditional factors such as the molecular biomarker expression, tumor staging, pathological characteristics, molecular subtypes, phenotypes, and treatment selection [54,55,56,57,58]. While some recent research has explored these aspects through an epigenetic lens including histone modification, DNA methylation, and non-coding RNA expression [59,60,61,62,63,64], these studies tend to approach the issue from a narrow perspective, often neglecting a more comprehensive analysis. In contrast to previous studies, our research takes a holistic approach by analyzing protein-coding genes associated with chromodomains, bromodomains, and Tudor domains. We systematically investigate the role that these genes play in CRC prognosis. While comprehensive analyses of protein-coding genes that interact with histone methylation and acetylation modifications are scarce in CRC prognosis, our study introduces a novel combination of biomarkers that considers the interactions among these genes. This approach aims to offer a more nuanced understanding of how epigenetic modifications regulate gene expression through specific proteins, collectively influencing tumor progression. By adopting this integrated perspective, our method provides deeper biological insights into the complex relationship between epigenetic modifications and tumor development. Ultimately, this work not only broadens the scope of CRC prognosis analysis but also contributes valuable theoretical support for future clinical applications and therapeutic strategies.
This study has several limitations. First, the data primarily rely on publicly available gene expression datasets, which may introduce sample selection bias. Second, while drug response predictions are provided, additional experimental validation is necessary, and not all clinical factors influencing CRC prognosis have been thoroughly considered. Furthermore, batch effects within the dataset may impact the accuracy of the analysis.
This study is the first to construct prognostic labels based on genes associated with specific domains, exploring their potential roles in immune cell infiltration and immune therapy responses. This approach offers novel insights into precision medicine, with substantial translational potential, and could guide personalized clinical treatments. Future research should aim to further validate these prognostic marker genes across multicenter clinical cohorts and investigate their combined effects with existing treatment regimens. Additionally, in-depth experimental studies are needed to uncover the specific functions of these domain-related genes within the tumor microenvironment, providing a new biological foundation for targeted CRC therapies.

5. Conclusions

In conclusion, our prognostic markers demonstrate the ability to predict both the severity of CRC and the extent of immune cell infiltration. The risk assessment based on these genes introduces an innovative tool for prognostication, underscoring the necessity for deeper functional analysis of the four identified CBT DCPFGs to uncover their potential clinical significance. Further research is essential to expand on these findings and enhance our understanding in this field.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/computation13070171/s1. Figure S1: The Kaplan–Meier curves for the OS in the low- and high-DRS groups of the TCGA cohort; Figure S2: The AUC of time-dependent ROC curves at 1, 3, and 5 years of the TCGA cohort; Figure S3: (A–D) The comparisons of estimate score, immune score, stromal score, and tumor purity between high- and low-risk score groups of the TCGA cohort; Figure S4: (A) 3000 genes with the largest variance. (B) Identification of cell clusters by t-SNE. (C) The expression level of each domain-related risk model gene in different clusters. (D) The AUC scoring of prognostic signature. 587 cells exceeded the threshold value of 0.038. (E) The t-SNE plots based on the AUC score of cells. Cell subsets exhibiting high AUC score are highlighted; Table S1: The CBT DCPFGs downloaded from GeneCards.

Author Contributions

Conception and design: X.C. and Y.X.; provision of study materials or patients: X.C. and G.L.; collection and assembly of data: X.C. and H.Z.; data analysis and interpretation: all authors; manuscript writing: X.C., Y.X., G.L. and H.Z.; funding acquisition and final approval of manuscript: all authors; administrative support: Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Inner Mongolia, China Grant Program (Grant/Award number: 2022LHMS03015, 2024MS03054 and 2024JQ10), the National Natural Science Foundation of China Grant Program (Grant/Award number: 62371265 and 62261043), the basic scientific research funding for universities directly under Inner Mongolia Autonomous Region, China Grant Program (Grant/Award number: 2023RCTD023), the grants from the Innovation Support Program for Overseas Returnees from Inner Mongolia Autonomous Region (grant to GL) and the 2025 Inner Mongolia Key Laboratory of Life Health and Bioinformatics Project (2025KYPT0135).

Data Availability Statement

The original data presented in the study are openly available in the GEO database (https://www.ncbi.nlm.nih.gov/geo/) and TCGA database (https://tcga-data.nci.nih.gov/tcga/).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Patel, A. Benign vs. Malignant Tumors. JAMA Oncol. 2020, 6, 1488. [Google Scholar] [CrossRef]
  2. Kuipers, E.J.; Grady, W.M.; Lieberman, D.; Seufferlein, T.; Sung, J.J.; Boelens, P.G.; van de Velde, C.J.H.; Watanabe, T. Colorectal cancer. Nat. Rev. Dis. Primers 2015, 1, 15065. [Google Scholar] [CrossRef] [PubMed]
  3. Baylin, S.B.; Jones, P.A. Epigenetic Determinants of Cancer. Cold Spring Harb. Perspect. Biol. 2016, 8, a019505. [Google Scholar] [CrossRef] [PubMed]
  4. Qin, J.; Wen, B.; Liang, Y.; Yu, W.; Li, H. Histone Modifications and their Role in Colorectal Cancer. Pathol. Oncol. Res. 2020, 26, 2023–2033. [Google Scholar] [CrossRef] [PubMed]
  5. Boyson, S.P.; Gao, C.; Quinn, K.; Boyd, J.; Paculova, H.; Frietze, S.; Glass, K.C. Functional roles of bromodomain proteins in cancer. Cancers 2021, 13, 606. [Google Scholar] [CrossRef] [PubMed]
  6. Dahiya, R.; Naqvi, A.A.T.; Mohammad, T.; Alajmi, M.F.; Rehman, T.; Hussain, A.; Hassan, I. Investigating the structural features of chromodomain proteins in the human genome and predictive impacts of their mutations in cancers. Int. J. Biol. Macromol. 2019, 131, 1101–1116. [Google Scholar] [CrossRef] [PubMed]
  7. Novillo, A.; Fernández-Santander, A.; Gaibar, M.; Galán, M.; Romero-Lorca, A.; El Abdellaoui-Soussi, F.; Arco, P.G.-D. Role of chromodomain-helicase-DNA-binding protein 4 (CHD4) in breast cancer. Front. Oncol. 2021, 11, 633233. [Google Scholar] [CrossRef] [PubMed]
  8. Lu, R.; Wang, G.G. Tudor: A versatile family of histone methylation ‘readers’. Trends Biochem. Sci. 2013, 38, 546–555. [Google Scholar] [CrossRef] [PubMed]
  9. Jiang, Y.; Liu, L.; Shan, W.; Yang, Z.-Q. An integrated genomic analysis of Tudor domain-containing proteins identifies PHD finger protein 20-like 1 (PHF20L1) as a candidate oncogene in breast cancer. Mol. Oncol. 2016, 10, 292–302. [Google Scholar] [CrossRef] [PubMed]
  10. Zhou, B.; Wang, Y.; Zhang, L.; Shi, X.; Kong, H.; Zhang, M.; Liu, Y.; Shao, X.; Liu, Z.; Song, H.; et al. The palmitoylation of AEG-1 dynamically modulates the progression of hepatocellular carcinoma. Theranostics 2022, 12, 6898–6914. [Google Scholar] [CrossRef] [PubMed]
  11. Sun, H.; Zhang, H. Lysine Methylation-Dependent Proteolysis by the Malignant Brain Tumor (MBT) Domain Proteins. Int. J. Mol. Sci. 2024, 25, 2248. [Google Scholar] [CrossRef] [PubMed]
  12. Cheng, Q.; Ji, W.; Lv, Z.; Wang, W.; Xu, Z.; Chen, S.; Zhang, W.; Shao, Y.; Liu, J.; Yang, Y. Comprehensive analysis of PHF5A as a potential prognostic biomarker and therapeutic target across cancers and in hepatocellular carcinoma. BMC Cancer 2024, 24, 868. [Google Scholar] [CrossRef] [PubMed]
  13. Carlson, M. hgu133plus2.db: Affymetrix Affymetrix HG-U133_Plus_2 Array Annotation Data (chip hgu133plus2), R package version 3.13.0 [Computer software]; Fred Hutchinson Cancer Center: Seattle, WA, USA, 2012. [Google Scholar]
  14. Davis, S.; Meltzer, P.S. GEOquery: A bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 2007, 14, 1846–1847. [Google Scholar] [CrossRef] [PubMed]
  15. Ritchie, M.E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015, 43, e47. [Google Scholar] [CrossRef] [PubMed]
  16. Carlson, M. org.Hs.eg.db: Genome Wide Annotation for Human, R package version 3.17.0 [Computer software]; Fred Hutchinson Cancer Center: Seattle, WA, USA, 2023. [Google Scholar]
  17. Wu, T.; Hu, E.; Xu, S.; Chen, M.; Guo, P.; Dai, Z.; Feng, T.; Zhou, L.; Tang, W.; Zhan, L.; et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation 2021, 2, 100141. [Google Scholar] [CrossRef] [PubMed]
  18. Yu, G.; Wang, L.-G.; Han, Y.; He, Q.-Y. clusterProfiler: An R package for comparing biological themes among gene clusters. OMICS 2012, 16, 284–287. [Google Scholar] [CrossRef] [PubMed]
  19. Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, USA, 2016. [Google Scholar]
  20. Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef] [PubMed]
  21. Zhang, J.; Jin, Z. ggrisk: Risk Score Plot for Cox Regression, R package version 1.3 [Computer software]; Schrödingerplatz: Vienna, Austria, 2021. [Google Scholar]
  22. Therneau, T. A Package for Survival Analysis in R, R package version 3.8-3 [Computer software]; Schrödingerplatz: Vienna, Austria, 2024. [Google Scholar]
  23. Therneau, T.M.; Grambsch, P.M. Modeling Survival Data: Extending the Cox Model; Springer: New York, NY, USA, 2000. [Google Scholar]
  24. Heagerty, P.J.; Saha-Chaudhuri, P.B.P. survivalROC: Time-Dependent ROC Curve Estimation from Censored Survival Data, R package version 1.0.3.1 [Computer software]; Schrödingerplatz: Vienna, Austria, 2022. [Google Scholar]
  25. Mayakonda, A.; Lin, D.-C.; Assenov, Y.; Plass, C.; Koeffler, H.P. Maftools: Efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018, 28, 1747–1756. [Google Scholar] [CrossRef] [PubMed]
  26. Barbie, D.A.; Tamayo, P.; Boehm, J.S.; Kim, S.Y.; Moody, S.E.; Dunn, I.F.; Schinzel, A.C.; Sandy, P.; Meylan, E.; Scholl, C.; et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 2009, 462, 108–112. [Google Scholar] [CrossRef] [PubMed]
  27. Zeng, D.; Li, M.; Zhou, R.; Zhang, J.; Sun, H.; Shi, M.; Bin, J.; Liao, Y.; Rao, J.; Liao, W. Tumor microenvironment characterization in gastric cancer identifies prognostic and immunotherapeutically relevant gene signatures. Cancer Immunol. Res. 2019, 7, 737–750. [Google Scholar] [CrossRef] [PubMed]
  28. Yoshihara, K.; Kim, H.; Verhaak, R.G. estimate: Estimate of Stromal and Immune Cells in Malignant Tumor Tissues from Expression Data, R package version 1.0.13/r21 [Computer software]; Schrödingerplatz: Vienna, Austria, 2016. [Google Scholar]
  29. Chen, Y.; Li, Z.-Y.; Zhou, G.-Q.; Sun, Y. An immune-related gene prognostic index for head and neck squamous cell carcinoma. Clin. Cancer Res. 2021, 27, 330–341. [Google Scholar] [CrossRef] [PubMed]
  30. Nickles, D.; Bourgon, R. IMvigor210CoreBiologies: Data Processing and Analysis Code for the Manuscript Mariathasan et al., TGF-b Attenuates Tumor Response to PD-L1 Blockade by Contributing to Exclusion of T Cells, R package version 1.0.0 [Computer software]; Genentech: South San Francisco, CA, USA, 2019. [Google Scholar]
  31. Maeser, D.; Gruener, R. oncoPredict: Drug Response Modeling and Biomarker Discovery, R package version 1.2 [Computer software]; Schrödingerplatz: Vienna, Austria, 2024. [Google Scholar]
  32. Maeser, D.; Gruener, R.F.; Huang, R.S. oncoPredict: An R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data. Brief. Bioinform. 2021, 22, bbab260. [Google Scholar] [CrossRef]
  33. Hao, Y.; Stuart, T.; Kowalski, M.H.; Choudhary, S.; Hoffman, P.; Hartman, A.; Srivastava, A.; Molla, G.; Madad, S.; Fernandez-Granda, C.; et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 2024, 42, 293–304. [Google Scholar] [CrossRef] [PubMed]
  34. Hao, Y.; Hao, S.; Andersen-Nissen, E.; Mauck, W.M., 3rd; Zheng, S.; Butler, A.; Lee, M.J.; Wilk, A.J.; Darby, C.; Zager, M.; et al. Integrated analysis of multimodal single-cell data. Cell 2021, 184, 3573–3587.e29. [Google Scholar] [CrossRef] [PubMed]
  35. Stuart, T.; Butler, A.; Hoffman, P.; Hafemeister, C.; Papalexi, E.; Mauck, W.M., III; Hao, Y.; Stoeckius, M.; Smibert, P.; Satija, R. Comprehensive Integration of Single-Cell Data. Cell 2019, 177, 1888–1902.e21. [Google Scholar] [CrossRef] [PubMed]
  36. Butler, A.; Hoffman, P.; Smibert, P.; Papalexi, E.; Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018, 36, 411–420. [Google Scholar] [CrossRef] [PubMed]
  37. Satija, R.; Farrell, J.A.; Gennert, D.; Schier, A.F.; Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 2015, 33, 495–502. [Google Scholar] [CrossRef] [PubMed]
  38. Aran, D.; Looney, A.P.; Liu, L.; Wu, E.; Fong, V.; Hsu, A.; Chak, S.; Naikawadi, R.P.; Wolters, P.J.; Abate, A.R.; et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 2019, 20, 163–172. [Google Scholar] [CrossRef] [PubMed]
  39. Aibar, S.; González-Blas, C.B.; Moerman, T.; Huynh-Thu, V.A.; Imrichova, H.; Hulselmans, G.; Rambow, F.; Marine, J.-C.; Geurts, P.; Aerts, J.; et al. SCENIC: Single-cell regulatory network inference and clustering. Nat. Methods 2017, 14, 1083–1086. [Google Scholar] [CrossRef] [PubMed]
  40. Aibar, S. AUCell: Analysis of ‘Gene Set’ Activity in Single-Cell RNA-seq Data; [Computer software]; Fred Hutchinson Cancer Center: Seattle, WA, USA, 2016. [Google Scholar]
  41. Wang, R.; Xin, M.; Li, Y.; Zhang, P.; Zhang, M. The functions of histone modification enzymes in cancer. Curr. Protein Pept. Sci. 2016, 17, 438–445. [Google Scholar] [CrossRef]
  42. Li, L.; Duan, Q.; Zeng, Z.; Zhao, J.; Lu, J.; Sun, J.; Zhang, J.; Siwko, S.; Wong, J.; Shi, T.; et al. UHRF2 promotes intestinal tumorigenesis through stabilization of TCF4 mediated Wnt/β-catenin signaling. Int. J. Cancer 2020, 147, 2239–2252. [Google Scholar] [CrossRef] [PubMed]
  43. Lin, R.; Chen, R.; Ye, L.; Huang, Z.; Lin, X.; Chen, T. The Role of RNA Methylation Modification Related Genes in Prognosis and Immunotherapy of Colorectal Cancer. Int. J. Gen. Med. 2023, 16, 2133–2147. [Google Scholar] [CrossRef] [PubMed]
  44. Liu, Y.; Zhang, B.; Meng, X.; Korn, M.J.; Parent, J.M.; Lu, L.-Y.; Yu, X. UHRF2 regulates local 5-methylcytosine and suppresses spontaneous seizures. Epigenetics 2017, 12, 551–560. [Google Scholar] [CrossRef]
  45. Wang, X.; Lu, H.; Sprangers, G.; Hallstrom, T.C. UHRF2 accumulates in early G(1)-phase after serum stimulation or mitotic exit to extend G(1) and total cell cycle length. Cell Cycle 2024, 23, 613–627. [Google Scholar] [CrossRef] [PubMed]
  46. Arita, K.; Isogai, S.; Oda, T.; Unoki, M.; Sugita, K.; Sekiyama, N.; Kuwata, K.; Hamamoto, R.; Tochio, H.; Sato, M.; et al. Recognition of modification status on a histone H3 tail by linked histone reader modules of the epigenetic regulator UHRF1. Proc. Natl. Acad. Sci. USA 2012, 109, 12950–12955. [Google Scholar] [CrossRef] [PubMed]
  47. Lu, S.; Yan, D.; Wu, Z.; Jiang, T.; Chen, J.; Yuan, L.; Lin, J.; Peng, Z.; Tang, H. Ubiquitin-like with PHD and ring finger domains 2 is a predictor of survival and a potential therapeutic target in colon cancer. Oncol. Rep. 2014, 31, 1802–1810. [Google Scholar] [CrossRef] [PubMed]
  48. Chalbatani, G.M.; Gharagouzloo, E.; Malekraeisi, M.A.; Azizi, P.; Ebrahimi, A.; Hamblin, M.R.; Mahmoodzadeh, H.; Elkord, E.; Miri, S.R.; Sanati, M.H.; et al. The integrative multi-omics approach identifies the novel competing endogenous RNA (ceRNA) network in colorectal cancer. Sci. Rep. 2023, 13, 19454. [Google Scholar] [CrossRef] [PubMed]
  49. Wu, L.; Chen, H.; Yang, C. Origin recognition complex subunit 1(ORC1) is a potential biomarker and therapeutic target in cancer. BMC Med. Genom. 2023, 16, 243. [Google Scholar] [CrossRef] [PubMed]
  50. Dasgupta, N.; Kumar Thakur, B.; Chakraborty, A.; Das, S. Butyrate-induced in vitro colonocyte differentiation network model identifies ITGB1, SYK, CDKN2A, CHAF1A, and LRP1 as the prognostic markers for colorectal cancer recurrence. Nutr. Cancer 2019, 71, 257–271. [Google Scholar] [CrossRef] [PubMed]
  51. Wu, Z.; Cui, F.; Yu, F.; Peng, X.; Jiang, T.; Chen, D.; Lu, S.; Tang, H.; Peng, Z. Up-regulation of CHAF1A, a poor prognostic factor, facilitates cell proliferation of colon cancer. Biochem. Biophys. Res. Commun. 2014, 449, 208–215. [Google Scholar] [CrossRef] [PubMed]
  52. Hedrick, E.D.; Agarwal, E.; Leiphrakpam, P.D.; Haferbier, K.L.; Brattain, M.G.; Chowdhury, S. Differential PKA activation and AKAP association determines cell fate in cancer cells. J. Mol. Signal 2013, 8, 10. [Google Scholar] [CrossRef] [PubMed]
  53. Chowdhury, S.; Howell, G.M.; Rajput, A.; Teggart, C.A.; Brattain, L.E.; Weber, H.R.; Chowdhury, A.; Brattain, M.G.; Chandra, D. Identification of a novel TGFbeta/PKA signaling transduceome in mediating control of cell survival and metastasis in colon cancer. PLoS ONE 2011, 6, e19335. [Google Scholar] [CrossRef] [PubMed]
  54. LièvRe, A.; Bachet, J.-B.; Le Corre, D.; Boige, V.; Landi, B.; Emile, J.-F.; Côté, J.-F.; Tomasic, G.; Penna, C.; Ducreux, M.; et al. KRAS mutation status is predictive of response to cetuximab therapy in colorectal cancer. Cancer Res. 2006, 66, 3992–3995. [Google Scholar] [CrossRef] [PubMed]
  55. Gennari, L.; Doci, R.; Rossetti, C. Prognostic factors in colorectal cancer. Hepatogastroenterology 2000, 47, 310–314. [Google Scholar] [PubMed]
  56. Guinney, J.; Dienstmann, R.; Wang, X.; de Reyniès, A.; Schlicker, A.; Soneson, C.; Marisa, L.; Roepman, P.; Nyamundanda, G.; Angelino, P.; et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 2015, 21, 1350–1356. [Google Scholar] [CrossRef]
  57. André, T.; Boni, C.; Navarro, M.; Tabernero, J.; Hickish, T.; Topham, C.; Bonetti, A.; Clingan, P.; Bridgewater, J.; Rivera, F.; et al. Improved overall survival with oxaliplatin, fluorouracil, and leucovorin as adjuvant treatment in stage II or III colon cancer in the MOSAIC trial. J. Clin. Oncol. 2009, 27, 3109–3116. [Google Scholar] [CrossRef] [PubMed]
  58. Ding, C.; Yang, X.; Li, S.; Zhang, E.; Fan, X.; Huang, L.; He, Z.; Sun, J.; Ma, J.; Zang, L.; et al. Exploring the role of pyroptosis in shaping the tumor microenvironment of colorectal cancer by bulk and single-cell RNA sequencing. Cancer Cell Int. 2023, 23, 95. [Google Scholar] [CrossRef]
  59. Li, X.; Li, J.; Li, J.; Liu, N.; Zhuang, L. Development and validation of epigenetic modification-related signals for the diagnosis and prognosis of colorectal cancer. BMC Genom. 2024, 25, 51. [Google Scholar] [CrossRef] [PubMed]
  60. Huang, H.; Chen, K.; Zhu, Y.; Hu, Z.; Wang, Y.; Chen, J.; Li, Y.; Li, D.; Wei, P. A multi-dimensional approach to unravel the intricacies of lactylation related signature for prognostic and therapeutic insight in colorectal cancer. J. Transl. Med. 2024, 22, 211. [Google Scholar] [CrossRef] [PubMed]
  61. Müller, D.; Győrffy, B. DNA methylation-based diagnostic, prognostic, and predictive biomarkers in colorectal cancer. Biochim. Biophys. Acta Rev. Cancer 2022, 1877, 188722. [Google Scholar] [CrossRef]
  62. Vuletić, A.; Mirjačić Martinović, K.; Spasić, J. Role of Histone Deacetylase 6 and Histone Deacetylase 6 Inhibition in Colorectal Cancer. Pharmaceutics 2023, 16, 54. [Google Scholar] [CrossRef] [PubMed]
  63. Li, C.; Song, J.; Guo, Z.; Gong, Y.; Zhang, T.; Huang, J.; Cheng, R.; Yu, X.; Li, Y.; Chen, L.; et al. EZH2 Inhibitors Suppress Colorectal Cancer by Regulating Macrophage Polarization in the Tumor Microenvironment. Front. Immunol. 2022, 13, 857808. [Google Scholar] [CrossRef] [PubMed]
  64. Conte, M.; Di Mauro, A.; Capasso, L.; Montella, L.; De Simone, M.; Nebbioso, A.; Altucci, L. Targeting HDAC2-Mediated Immune Regulation to Overcome Therapeutic Resistance in Mutant Colorectal Cancer. Cancers 2023, 15, 1960. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Flowchart of the study. First, we identified differentially expressed CBT DCPFGs using transcriptomic and clinical data from normal colorectal tissues and CRC patients in the GSE39582 dataset. Next, we performed enrichment analysis on these differentially expressed genes and constructed a domain-related prognostic risk model through LASSO-COX regression analysis. The steps outlined above were validated using the TCGA CRC dataset and the GSE132257 single-cell dataset. Finally, leveraging the constructed risk model, we conducted a comprehensive series of analyses, including survival analysis, an exploration of the association between DRS and clinical characteristics, a comparison of immune-related features across CRC samples having different DRS, an exploration of the role of DRS in anti-PD1/PD-L1 immunotherapy, and a prediction of potential therapeutic drugs for CRC.
Figure 1. Flowchart of the study. First, we identified differentially expressed CBT DCPFGs using transcriptomic and clinical data from normal colorectal tissues and CRC patients in the GSE39582 dataset. Next, we performed enrichment analysis on these differentially expressed genes and constructed a domain-related prognostic risk model through LASSO-COX regression analysis. The steps outlined above were validated using the TCGA CRC dataset and the GSE132257 single-cell dataset. Finally, leveraging the constructed risk model, we conducted a comprehensive series of analyses, including survival analysis, an exploration of the association between DRS and clinical characteristics, a comparison of immune-related features across CRC samples having different DRS, an exploration of the role of DRS in anti-PD1/PD-L1 immunotherapy, and a prediction of potential therapeutic drugs for CRC.
Computation 13 00171 g001
Figure 2. Differential expression analysis and enrichment analysis of CBT DCPFGs. (A,B) The results of the differential expression analysis for normal vs. tumor using the GSE39582 and TCGA-CRC data. (C,D) The upregulated and downregulated CBT DCPFGs shared by the training group GSE39582 and testing group TCGA. (E) The GO enrichment analysis of CBT DCPFGs. (F) The KEGG pathway enrichment analysis of CBT DCPFGs.
Figure 2. Differential expression analysis and enrichment analysis of CBT DCPFGs. (A,B) The results of the differential expression analysis for normal vs. tumor using the GSE39582 and TCGA-CRC data. (C,D) The upregulated and downregulated CBT DCPFGs shared by the training group GSE39582 and testing group TCGA. (E) The GO enrichment analysis of CBT DCPFGs. (F) The KEGG pathway enrichment analysis of CBT DCPFGs.
Computation 13 00171 g002
Figure 3. Screening of CBT DCPFGs and the construction of a risk assessment model. (A) Univariate Cox regression analysis to pinpoint independent prognostic indicators. (B) Multivariate Cox regression analysis to pinpoint independent prognostic indicators. (C) LASSO coefficient spectrum of differentially expressed CBT DCPFGs screened. (D) LASSO regression analysis using the criterion of 10-fold cross-validation, with the position of the dotted line representing the number of genes selected at lambda.min (0.03744). (E) Forest map of the Cox analysis of the four independent prognostic genes. The HR and 95% confidence intervals (CI) were transformed using log10. (F) Landscape of mutation of the four prognostic genes.
Figure 3. Screening of CBT DCPFGs and the construction of a risk assessment model. (A) Univariate Cox regression analysis to pinpoint independent prognostic indicators. (B) Multivariate Cox regression analysis to pinpoint independent prognostic indicators. (C) LASSO coefficient spectrum of differentially expressed CBT DCPFGs screened. (D) LASSO regression analysis using the criterion of 10-fold cross-validation, with the position of the dotted line representing the number of genes selected at lambda.min (0.03744). (E) Forest map of the Cox analysis of the four independent prognostic genes. The HR and 95% confidence intervals (CI) were transformed using log10. (F) Landscape of mutation of the four prognostic genes.
Computation 13 00171 g003
Figure 4. (A) Distribution of risk scores and expression levels based on the four domain-related genes in our model, alongside the survival status of CRC patients grouped by high and low risk scores in the GEO dataset. (B) Kaplan–Meier curves for the OS in the high- and low- risk score groups. (C) AUC of time-dependent ROC curves at 1, 3, and 5 years. (D) Landscape of genetic mutation in the high-risk score group. (E) Landscape of genetic mutation in the low-risk score group.
Figure 4. (A) Distribution of risk scores and expression levels based on the four domain-related genes in our model, alongside the survival status of CRC patients grouped by high and low risk scores in the GEO dataset. (B) Kaplan–Meier curves for the OS in the high- and low- risk score groups. (C) AUC of time-dependent ROC curves at 1, 3, and 5 years. (D) Landscape of genetic mutation in the high-risk score group. (E) Landscape of genetic mutation in the low-risk score group.
Computation 13 00171 g004
Figure 5. The association between DRS and clinical features in the GEO cohort. (AK) Comparison of DRS in CRC samples with respect to age, gender, pathological stage, AJCC-T stage, AJCC-N stage, AJCC-M stage, MMR status, KRAS mutation status, BRAF mutation status, TP53 mutation status, and tumor’s location, respectively.
Figure 5. The association between DRS and clinical features in the GEO cohort. (AK) Comparison of DRS in CRC samples with respect to age, gender, pathological stage, AJCC-T stage, AJCC-N stage, AJCC-M stage, MMR status, KRAS mutation status, BRAF mutation status, TP53 mutation status, and tumor’s location, respectively.
Computation 13 00171 g005
Figure 6. The DRS correlates with the immune cell infiltration pattern in the GEO cohort. (A) Comparison of expression level of antigen presentation, immune-checkpoint and immune-activation gene sets between high- and low-risk score groups. (BE) Comparison of estimate score, immune score, stromal score, and tumor purity between high- and low-risk score groups. (F) Abundance of 28 immune cells in TME. **** p < 0.0001, *** p < 0.001, ** p < 0.01, * p < 0.05, ns: no significance.
Figure 6. The DRS correlates with the immune cell infiltration pattern in the GEO cohort. (A) Comparison of expression level of antigen presentation, immune-checkpoint and immune-activation gene sets between high- and low-risk score groups. (BE) Comparison of estimate score, immune score, stromal score, and tumor purity between high- and low-risk score groups. (F) Abundance of 28 immune cells in TME. **** p < 0.0001, *** p < 0.001, ** p < 0.01, * p < 0.05, ns: no significance.
Computation 13 00171 g006
Figure 7. The powerful role of the DRS scoring model in PD1/PD-L1 immunotherapy. (A) Differential expression of immune checkpoint genes in low DRS subgroup and high DRS subgroup of the GSE39582 cohort (Wilcox test). (B) The relationship between TIDE and DRS in the GSE39582 cohort (Spearman test). (C) Comparison of DRS between samples with non-response and response to immunotherapy in the IMvigor 210 cohort (t test). (D) Different DRS in the IMvigor210 cohort’s immune phenotypes (Kruskal–Wallis test). *** p < 0.001, ** p < 0.01, * p < 0.05, ns: no significance.
Figure 7. The powerful role of the DRS scoring model in PD1/PD-L1 immunotherapy. (A) Differential expression of immune checkpoint genes in low DRS subgroup and high DRS subgroup of the GSE39582 cohort (Wilcox test). (B) The relationship between TIDE and DRS in the GSE39582 cohort (Spearman test). (C) Comparison of DRS between samples with non-response and response to immunotherapy in the IMvigor 210 cohort (t test). (D) Different DRS in the IMvigor210 cohort’s immune phenotypes (Kruskal–Wallis test). *** p < 0.001, ** p < 0.01, * p < 0.05, ns: no significance.
Computation 13 00171 g007
Figure 8. The differences in drug sensitivity of OSI-027 and BMS-754807 based on the GDSC database. **** p < 0.0001, *** p < 0.001, ** p < 0.01, * p < 0.05, ns: no significance.
Figure 8. The differences in drug sensitivity of OSI-027 and BMS-754807 based on the GDSC database. **** p < 0.0001, *** p < 0.001, ** p < 0.01, * p < 0.05, ns: no significance.
Computation 13 00171 g008
Figure 9. Investigation of the domain-related risk model at the single-cell level. (A) Cell annotation of clusters identified by t-SNE. (B) The expression of four prognostic signature genes in cell subsets. (C) Cell annotation of immune cell subsets. (D) The proportion of different cell subsets in the low- and high-AUC groups.
Figure 9. Investigation of the domain-related risk model at the single-cell level. (A) Cell annotation of clusters identified by t-SNE. (B) The expression of four prognostic signature genes in cell subsets. (C) Cell annotation of immune cell subsets. (D) The proportion of different cell subsets in the low- and high-AUC groups.
Computation 13 00171 g009
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cui, X.; Xing, Y.; Liu, G.; Zhao, H.; Yang, Z. Construction and Evaluation of a Domain-Related Risk Model for Prognosis Prediction in Colorectal Cancer. Computation 2025, 13, 171. https://doi.org/10.3390/computation13070171

AMA Style

Cui X, Xing Y, Liu G, Zhao H, Yang Z. Construction and Evaluation of a Domain-Related Risk Model for Prognosis Prediction in Colorectal Cancer. Computation. 2025; 13(7):171. https://doi.org/10.3390/computation13070171

Chicago/Turabian Style

Cui, Xiangjun, Yongqiang Xing, Guoqing Liu, Hongyu Zhao, and Zhenhua Yang. 2025. "Construction and Evaluation of a Domain-Related Risk Model for Prognosis Prediction in Colorectal Cancer" Computation 13, no. 7: 171. https://doi.org/10.3390/computation13070171

APA Style

Cui, X., Xing, Y., Liu, G., Zhao, H., & Yang, Z. (2025). Construction and Evaluation of a Domain-Related Risk Model for Prognosis Prediction in Colorectal Cancer. Computation, 13(7), 171. https://doi.org/10.3390/computation13070171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop