The Construction of a Multi-Gene Risk Model for Colon Cancer Prognosis and Drug Treatments Prediction

In clinical practice, colon cancer is a prevalent malignant tumor of the digestive system, characterized by a complex and progressive process involving multiple genes and molecular pathways. Historically, research efforts have primarily focused on investigating individual genes; however, our current study aims to explore the collective impact of multiple genes on colon cancer and to identify potential therapeutic targets associated with these genes. For this research, we acquired the gene expression profiles and RNA sequencing data of colon cancer from TCGA. Subsequently, we conducted differential gene expression analysis using R, followed by GO and KEGG pathway enrichment analyses. To construct a protein–protein interaction (PPI) network, we selected survival-related genes using the log-rank test and single-factor Cox regression analysis. Additionally, we performed LASSO regression analysis, immune infiltration analysis, mutation analysis, and cMAP analysis, as well as an investigation into ferroptosis. Our differential expression and survival analyses identified 47 hub genes, and subsequent LASSO regression analysis refined the focus to 23 key genes. These genes are closely linked to cancer metastasis, proliferation, apoptosis, cell cycle regulation, signal transduction, cancer microenvironment, immunotherapy, and neurodevelopment. Overall, the hub genes discovered in our study are pivotal in colon cancer and are anticipated to serve as important biological markers for the diagnosis and treatment of the disease.


Introduction
Colon cancer (CC) ranks as the fourth deadliest cancer globally [1].The accumulation of diverse genetic and epigenetic alterations in colonic epithelial cells is a fundamental process underlying the onset and advancement of CC [2].Colon cancer (CC) is a multifaceted and progressive condition that implicates numerous genes and stages.Several biomarkers associated with the survival and prognosis of CC have been investigated previously.Nonetheless, individual genes or biomarkers alone are insufficient for accurately predicting the outcomes of cancer patients [3,4].
The study involved analyzing the transcriptome sequencing data from TCGA-COAD and normal samples.Various methods were employed to analyze differentially expressed genes in CC, such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, survival analysis, protein-protein interaction (PPI) network construction, LASSO regression analysis, tumor microenvironment analysis, Connectivity Map (cMAP) analysis, and ferroptosis analysis.These identified factors and pathways can function as biomarkers for cancer development and potential targets for the clinical treatment of CC.

DEGs Identification
In this study, we conducted principal component analysis (PCA) on the TCGA-COAD dataset to differentiate between cancerous and normal tissues.The PCA plot displayed a clear separation between the two groups, suggesting distinct gene expression profiles in cancer and normal tissues (Figure 1B).After preprocessing the raw dataset, we utilized the three main R packages (DESeq2, edgeR, and limma) to identify differentially expressed genes (DEGs) in the TCGA-COAD dataset independently and create volcano plots (Figure 1A).The overlap of the DEGs identified by the three primary R packages was determined.Next, Venn diagrams were created separately for the upregulated and downregulated genes (Figure 1B).A heatmap was generated using the TCGA-COAD dataset to compare gene expression profiles between cancerous and normal tissues.The heatmap revealed distinct patterns of gene expression, emphasizing the discrepancies between cancer and normal tissues (Figure 1C).

Gene Set Enrichment Analysis
We performed GO enrichment and KEGG pathway analysis using the clusterProfiler package in R version is R 4.3.2 with 746 upregulated DEGs and 1083 downregulated DEGs.We visualizes the results using the ggplot2 package (Figure 2).Regarding biological process (BP) enrichment, the results suggest that DEGs are primarily involved in the production of molecular mediators of the immune response and immunoglobulin production (Figure 2A).For cellular component (CC) enrichment, DEGs were predominantly enriched in the immunoglobulin complex, collagen-containing extracellular matrix, apical plasma membrane, cell projection membrane, and monoatomic ion channel complex (Figure 2B).Regarding molecular function (MF) enrichment, DEGs were predominantly enriched in metal ion transmembrane transporter activity, monoatomic ion channel activity, glycosaminoglycan binding, extracellular matrix structural constituent, serine-type peptidase activity, serine hydrolase activity, serine-type endopeptidase activity, and metallopeptidase activity (Figure 2C).In the KEGG pathway analysis, the DEGs were grouped in the neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction, cAMP signaling pathway, calcium signaling pathway, and cell adhesion molecules (Figure 2D).
tidase activity, serine hydrolase activity, serine-type endopeptidase activity, and metallopeptidase activity (Figure 2C).In the KEGG pathway analysis, the DEGs were grouped in the neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction, cAMP signaling pathway, calcium signaling pathway, and cell adhesion molecules (Figure 2D).

Survival Curves of 47 Hub Genes and Their Expression Levels in COAD
The survival curves and differential expression profiles of the 47 hub genes are presented in the Supplementary Figures (Supplementary Figures S1-S3).Here, only the Kaplan-Meier survival curves and differential expression profiles of the important genes CDKN2A, CXCL1, CLCA1, MMP3, and MMP1 are shown (Figure 3).According to the research results, we observed that in CC tissues, the expression level of the CDKN2A gene exceeds the expression in normal tissues.However, the patients' survival rates with low expression outperform those of patients with high expression.Likewise, the expression of the CXCL1 gene in CC exceeds that in normal tissues, and the patients' survival rates with high expression are markedly higher than those with low expression.On the other hand, the expression level of the CLCA1 gene in CC tissues is lower than in normal tissues, but the patients' survival rates with high expression are markedly higher than those with low expression.Additionally, the expression of the MMP3 and MMP1 genes is upregulated in CC tissues, and the patients' survival rates with high expression are markedly higher than those with low expression.

PPI Network Construction
To explore the interactions among the 47 hub genes, we initially analyzed them using the STRING database and subsequently visualized their interactions as a protein-protein interaction (PPI) network in Cytoscape.The network comprises 29 nodes and 44 edges, as depicted in Figure 4A.We utilized cytoHubba to analyze the top 15 central genes in the PPI network (Figure 4B).Based on the findings from the protein-protein interaction network (PPI) analysis, we have identified notable interactions among 15 genes (CDKN2A, CXCL1, CLCA1, MMP3, MMP1, ITLN1, NME1, HEPACAM2, GRIK5, AOC3, ULBP2, ACSL6, STC2, GABRD, and CA4) in CC.These findings suggest that these genes may perform critical functions in the pathophysiological mechanisms of CC.

LASSO Regression Analysis
Following LASSO regression analysis, lasso.minidentified 23 genes, whereas lasso.1seidentified 18 genes (Figure 5A,B).We generated distinct AUC curves for lasso.minand lasso.1se(Figure 5C).We then plotted the AUC curves for the 23 selected genes by lasso.min at one, three, and five years (Figure 5D).We conducted a Kaplan-Meier survival analysis comparing high-risk and low-risk groups (Figure 5E).We generated a series of three linked risk factor plots, each displaying unique information: (a) predicted values for each patient sorted in ascending order, (b) patient survival time with color coding for living and deceased patients, and (c) heatmap illustrating the gene expression levels for selected genes in each sample (Figure 5F).Finally, we created a risk forest plot for the 23 selected genes (Figure 5G).After conducting LASSO regression, the AUC survival curve for the 23 selected genes was determined to be 0.81.The AUC prediction for multi-year survival rates yielded values of 0.8 at one year, 0.76 at three years, and 0.81 at five years.Upon stratifying the samples into high-and low-risk groups, it became

LASSO Regression Analysis
Following LASSO regression analysis, lasso.minidentified 23 genes, whereas lasso.1seidentified 18 genes (Figure 5A,B).We generated distinct AUC curves for lasso.minand lasso.1se(Figure 5C).We then plotted the AUC curves for the 23 selected genes by lasso.min at one, three, and five years (Figure 5D).We conducted a Kaplan-Meier survival analysis comparing high-risk and low-risk groups (Figure 5E).We generated a series of three linked risk factor plots, each displaying unique information: (a) predicted values for each patient sorted in ascending order, (b) patient survival time with color coding for living and deceased patients, and (c) heatmap illustrating the gene expression levels for selected genes in each sample (Figure 5F).Finally, we created a risk forest plot for the 23 selected genes (Figure 5G).After conducting LASSO regression, the AUC survival curve for the 23 selected genes was determined to be 0.81.The AUC prediction for multi-year survival rates yielded values of 0.8 at one year, 0.76 at three years, and 0.81 at five years.Upon stratifying the samples into high-and low-risk groups, it became evident from the Kaplan-Meier survival curves that the low-risk group exhibited notably higher survival rates than the high-risk group.Furthermore, a three-way interactive visualization of risk factors was developed.The reliability of the predictive model established by these 23 genes was assessed using a random forest model, revealing a very small p-value and a concordance index value of 0.79.These findings collectively indicate the high reliability of the predictive model.

Immune Cell Infiltration Analysis
To explore immune infiltration in cancerous and normal tissues within TCGA-COAD, we analyzed the processed expression matrix using the XCELL algorithm.We then created differential boxplots for 64 immune cell types (Figure 6A).We further investigated the relationship between the genes selected by lasso.minand the 64 immune cell types, generating a correlation heatmap (Figure 6B).Finally, we analyzed the differential expression of immune checkpoint genes in cancerous and normal tissues of TCGA-COAD and visualized the results with a corresponding lollipop chart (logFC ≥ 1, p < 0.05) (Figure 6C).According to the research results, we observed a significant decrease in CD8+ and CD4+ T cells in the adaptive immune cell population, while regulatory T cells (Tregs) significantly increased.In the myeloid immune cell population, M1 macrophages showed a significant decrease, whereas M2 macrophages exhibited a significant increase in CC.Furthermore, neutrophils, monocytes, dendritic cells, and mast cells showed a significant decrease, while natural killer (NK) cells showed an increase.In the analysis of the gene correlations selected by the LASSO model among 64 immune cells, we found that CCBE1, ZBTB7C, TPSG1, and CLDN23 were positively correlated with the majority of immune cells, whereas GABRE and TSPEAR were negatively correlated.In the analysis of immune checkpoint expression, we observed significant upregulation of TNFSF9, VTCN1, CD74, TDO2, TNFSF4, BTNL9, and CTLA4 in CC cells, while BTNL3, CEACAM1, CD209, CD160, KIR2DL4, BTLA, CD27, CD96, KIR3DL2, and CD40LG showed significant downregulation.

Mutation Analysis
To examine the gene mutation status in TCGA-COAD, we used the TCGAmutations package to retrieve and analyze the data, followed by generating an overview of the gene mutation landscape (Figure 7A).Additionally, we explored the mutation status of genes selected by lasso.minand depicted their respective mutation spectrum plot (Figure 7B).In the analysis of mutations, we found that missense mutations are the most prevalent in colorectal cancer, with SNP being the most common mutation type.The primary mutated genes comprise TTN, APC, MUC16, SYNE1, TP53, FAT4, and KRAS.Furthermore, ZBTB7C, WDR78, CCBE1, WDR72, and MMP3 demonstrate the highest mutation frequencies among the genes identified by the LASSO regression model.

Connectivity Map (cMAP) Analysis
To explore potential small molecule drugs with therapeutic potential in colon cancer patients, 150 upregulated and 150 downregulated differentially expressed genes (DEGs) were separately entered into the Connectivity Map (cMAP) database to identify small molecule compounds capable of reversing the expression changes linked to colorectal cancer-related pathogenic genes.After thorough analysis, the top nine compounds with the most negative scores, such as ISOX, vorinostat, NVP-AUY922, selumetinib, AS-703026, THM-I-94, NVP-TAE684, trichostatin-a, and scriptaid, were recognized as potential therapeutic agents for colon cancer treatment (Figure 8A).The chemical structures of these nine small molecule drugs are depicted in Figure 8B.

Ferroptosis Analysis
To explore the association between colon cancer and ferroptosis, we utilized FerrDB to gather ferroptosis-related genes.The expression variances of ferroptosis driver genes (Figure 9A) and suppressor genes (Figure 9B) in the TCGA-COAD dataset were visualized (logFC ≥ 1, p < 0.05).Subsequently, the associated protein-protein interaction (PPI) network was depicted (logFC ≥ 2, p < 0.05) (Figure 9C).CytoHubba was used to analyze the top 10 central genes in the PPI network (Figure 9D).In the ferroptosis-related analysis results, we found that in the ferroptosis-driving genes, the expression of genes such as H19, MIOX, ALOXE3, NOX4, PVT1, and CDKN2A was significantly upregulated, whereas LIFR, CPEB1, and MT1DP were significantly downregulated.In the ferroptosis-suppressing genes, genes such as CA9, ETV4, LINC01833, HCAR1, TFAP2A, and GDF15 were significantly upregulated, while MT1G and PDK4 genes were significantly downregulated.Additionally, in the context of the interaction of ferroptosis genes (logFC > 2) in CC, the top 10 most important genes were identified as CDKN2A, GDF15, MYCN, SCD, SLC7AL1, PDK4, NOX4, LCN2, CP, and CA9.molecule compounds capable of reversing the expression changes linked to colorectal cancer-related pathogenic genes.After thorough analysis, the top nine compounds with the most negative scores, such as ISOX, vorinostat, NVP-AUY922, selumetinib, AS-703026, THM-I-94, NVP-TAE684, trichostatin-a, and scriptaid, were recognized as potential therapeutic agents for colon cancer treatment (Figure 8A).The chemical structures of these nine small molecule drugs are depicted in Figure 8B.

Discussion
In this study, we obtained and analyzed the TCGA-COAD dataset.Utilizing the differential analysis R packages DESeq2, edgeR, and limma, we identified 746 upregulated genes and 1083 downregulated genes, which constitute a highly reliable set of differen-

Discussion
In this study, we obtained and analyzed the TCGA-COAD dataset.Utilizing the differential analysis R packages DESeq2, edgeR, and limma, we identified 746 upregulated genes and 1083 downregulated genes, which constitute a highly reliable set of differentially expressed genes.Subsequently, we used these genes to address several questions through bioinformatics analyses: What are the significant Gene Ontology (GO) terms and KEGG signaling pathways in colon cancer (CC)?Which genes in these DEGs are linked to survival and act as hub genes?How do these hub genes interact?How can they be used for cancer prediction?What are the correlations between key genes from the prediction model and tumor microenvironment and gene mutations?How can drug screening be conducted based on these DEGs?Lastly, what is the relationship between CC and ferroptosis?
The analysis of biological process (BP) enrichment revealed that the differentially expressed genes (DEGs) are mainly linked to the production of molecular mediators involved in the immune response and immunoglobulin synthesis.These processes are closely associated with immune evasion in cancer.Producing immune response molecules is a crucial defense mechanism of the body against pathogens and cancer cells.Nevertheless, cancer cells can manipulate the production of immune response molecules and the generation of immunoglobulins through various mechanisms, allowing them to evade immune system attacks.
Within the cellular component (CC) enrichment analysis, the differentially expressed genes (DEGs) showed notable enrichment in the immunoglobulin complex, collagencontaining extracellular matrix, apical plasma membrane, cell projection membrane, and monoatomic ion channel complex.Immunoglobulin, a crucial blood protein, plays a significant role in the immune system by recognizing and eliminating pathogens and abnormal cells.Collagen, a vital structural protein in the extracellular matrix, is crucial for cell support and structure, and its abnormal accumulation may contribute to tumor invasion and metastasis.The apical membrane, a cellular surface membrane, is responsible for maintaining cell morphology and signal transduction, with its abnormal features potentially linked to enhanced invasiveness in specific cancer cells.The cell projection membrane, a cell surface structure, influences cell adhesion and movement, playing a crucial role in tumor invasion and metastasis.Ion channels, protein channels located on the cell membrane, control the ion balance within and outside the cell, with sodium channels playing a crucial role in tumor development.
In terms of molecular function (MF) enrichment, the differentially expressed genes (DEGs) exhibited significant enrichment in metal ion transmembrane transporter activity, monoatomic ion channel activity, glycosaminoglycan binding, extracellular matrix structural constituent, serine-type peptidase activity, serine hydrolase activity, serine-type endopeptidase activity, and metallopeptidase activity.Metal ions have a critical regulatory function within cells, and dysregulated expression of specific metal ion channel proteins may be linked to cancer development and progression through the modulation of intracellular and extracellular ion balance, impacting processes like cell proliferation and apoptosis.Monoatomic ion channel activity entails the specific transport of individual atomic ions (e.g., sodium, potassium, calcium) through channel proteins on the cell membrane.Aberrant ion channel activity can disturb intracellular and extracellular ion balance, potentially fostering cancer progression.Glycosaminoglycans are integral components of the extracellular matrix, and aberrant glycosaminoglycan binding capacity may be associated with tumor invasion and metastasis.Certain glycosaminoglycan receptors on cancer cell surfaces can enhance tumor cell adhesion, invasion, and metastasis.The extracellular matrix provides structural support outside cells, consisting of various proteins and polysaccharides.Defects in extracellular matrix components may promote tumor invasion and metastasis by influencing cell-tissue interactions.With regard to serine protease activity, serine hydrolase activity, serine endopeptidase activity, and metallopeptidase activity, these enzymes participate in protein degradation and modification.Dysregulated protease activities can disrupt intracellular signaling pathways, promoting cancer progression.
The KEGG pathway analysis indicated a significant association of the DEGs with neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction, the cAMP signaling pathway, the calcium signaling pathway, and cell adhesion molecules.The neuroactive ligand-receptor interaction and cytokine-cytokine receptor interaction pathways are vital in cell signaling, governing growth factors, neurotransmitters, and hormones.Dysregulated signaling may result in uncontrolled cell proliferation, inhibited apoptosis, and tumor formation.The cAMP signaling pathway and calcium signaling pathway regulate intracellular signal transduction and modulate intracellular calcium ion levels.Abnormal cAMP and calcium signaling are closely associated with cancer initiation and progression.Cell adhesion molecules are crucial for cell-cell adhesion and interaction, influencing processes such as cell migration, invasion, and metastasis.These processes play a crucial role in tumor development and dissemination.
After conducting the log-rank test and single-factor Cox regression analysis (with log rank p < 0.05 and Cox results p < 0.05), a total of 47 survival-related genes were identified from the highly credible DEGs.By analyzing the protein-protein interaction (PPI) network, 15 central genes were identified among the 47 hub genes.
The onset of CC involves numerous factors and genes.Processes such as cell growth, apoptosis, DNA repair, and signal transduction can be impacted by abnormal gene expression and mutations, resulting in different molecular mechanisms that influence the onset of CC.The CDKN2A gene encodes two tumor suppressor proteins, p16 and p14, which play crucial roles in regulating the cell cycle and metabolism in melanoma.Furthermore, these proteins are closely linked to immune infiltration in the tumor microenvironment [5].Elevated CDKN2A expression in CC is linked to an unfavorable prognosis.CXCL1 can promote the migration and invasiveness of breast cancer by activating the transcription of SOX4 via the NF-κB pathway, resulting in the subsequent epithelial-mesenchymal transition (EMT) process [6].CXCL1 expression is increased in CC.Elevated CLCA1 expression levels can inhibit the invasiveness of colorectal cancer (CRC).CLCA1 may function as a tumor suppressor by inhibiting the Wnt/β-catenin signaling pathway and the epithelialmesenchymal transition (EMT) process [7].Reduced CLCA1 expression in CC is linked to a favorable prognosis.MMP1 plays a crucial role in promoting tumor progression in large cell carcinoma of the lung by inducing fibroblast senescence [8].MMP3 initially collaborates with oncogenic KRAS to drive tumorigenesis in pancreatic cancer and activate the stromal microenvironment.Subsequently, it becomes a key driver in promoting tumor invasion and progression [9].Ovarian cancer (OC) cells have been observed to modify mesothelial cells in visceral adipose tissue by downregulating ITLN1, thereby enhancing the invasion potential and proliferation of OC cells in the omental microenvironment [10].These genes collectively influence the onset of CC through their involvement in the cell cycle, inflammatory response, intercellular signaling, and the tumor microenvironment.Their abnormal expression and regulatory relationships constitute the complex mechanisms underlying the occurrence, development, and metastasis of CC.In-depth exploration of these mechanisms is essential for understanding the onset process of CC, identifying potential therapeutic targets, and developing personalized treatment strategies.
In the field of CC research, accurately predicting the survival prognosis of patients is crucial for developing personalized treatment plans and providing appropriate medical care.The objective of this study is to utilize the LASSO regression model to establish a multi-gene prediction model for predicting the survival prognosis of CC patients.Through the analysis and integration of multiple genes related to the survival prognosis of CC patients, we aim to provide a reliable prediction tool for clinical practice, in order to better understand the disease progression of patients and support medical decision-making, with the goal of early detection leading to early treatment.In this study, we used the LASSO regression model to select and evaluate CC-related genes to identify those with significant predictive capabilities for patient survival prognosis.Compared to traditional prediction models, the LASSO regression model is highly favored for its ability to effectively handle high-dimensional data and reduce model complexity.Through this model, we successfully identified a set of key genes closely associated with the survival prognosis of CC patients, providing important gene markers for the construction of the prediction model.Furthermore, our research revealed that, compared to previous studies, our analysis not only clarified the key genes related to the survival prognosis of CC but also uncovered the underlying potential mechanisms behind CC survival prognosis, providing valuable clues for further in-depth research.Our findings not only expanded the understanding of CC survival prognosis but also offered a new perspective for personalized treatment of CC in terms of potential drug targets or diagnostic markers.
The tumor microenvironment (TME) is the origin and residence of tumor cells, comprising not only the tumor cells themselves but also neighboring cells like fibroblasts, immune cells, inflammatory cells, and vascular cells collectively known as cancer-associated stromal and immune cells.Moreover, it includes the secretory products of these cells, such as cytokines, chemokines, and non-cellular components of the extracellular matrix (ECM).Tumor growth and metastasis are intricately linked to the surrounding environment.Key features of the tumor microenvironment include hypoxia, chronic inflammation, and immune suppression, which collaboratively enhance the development and growth of tumor cells.During tumor development, local immune cells play a pivotal role in shaping the composition of the tumor microenvironment.Different cell types in the TME can display either tumor-suppressive or tumor-supportive properties.The diverse immune and stromal cells in the TME, along with their secretory products and the extracellular matrix, collectively impact tumor development [11].
There is a notable decrease in CD8+ and CD4+ T cells, along with a significant increase in regulatory T cells (Tregs), in the adaptive immune cell population in CC.In the myeloid immune cell population, M1 macrophages show significant downregulation, whereas M2 macrophages exhibit significant upregulation in CC.Moreover, neutrophils, monocytes, dendritic cells, and mast cells demonstrate significant downregulation, while natural killer (NK) cells show upregulation in CC across both adaptive and innate immune cell populations (Figure 6A).These findings illuminate the immune cell composition in the tumor microenvironment and its potential influence on tumor progression.The alterations in the immune cell composition documented in CC suggest a complex interplay between the tumor and the immune system, which may contribute to the immunosuppressive characteristics of the tumor microenvironment in CC.Moreover, these insights may have implications for the development of targeted immunotherapies to modulate the immune landscape in CC.We can target the disparities in immune cell infiltration between colon cancer and normal intestinal tissue by developing targeted drugs that enhance the activity of specific immune cells, such as NK cells, CD8 T cells, etc., amplifying the activity and quantity of these immune cells and consequently eradicating cancer cells to achieve effective cancer treatment.Immune checkpoint genes (ICGs) are vital in evading selfreaction and serve as novel targets for developing cancer treatment methods [12].We can observe significant up-regulation of TNFSF9 and VTCN1, while BTNL3, CEACAM1, CD209, CD160, and KIR2DL4 are significantly down-regulated (Figure 6C).TNFSF9 is significantly up-regulated in pancreatic cancer (PC) and may promote the growth and metastasis of PC in vivo and in vitro through the Wnt/Snail signaling pathway.Additionally, TNFSF9 can induce the M2 polarization of macrophages by activating the Wnt signaling in pancreatic cancer cells, thereby promoting the metastasis of PC [13].VTCN1 (B7-H4) is highly expressed in many tumor tissues.The biological activity of B7-H4 is associated with a reduced inflammatory CD4 T cell response, and the correlation between tumor-associated macrophages expressing B7-H4 and regulatory T cells (Tregs) expressing FoxP3 in the tumor microenvironment [14].The human intestinal epithelial cells express BTNL3, which induces a selective TCR-dependent response in human colonic Vγ4 cells [15].CEACAM1 serving as an allosteric ligand of TIM-3, is essential for its ability to mediate T cell suppression.This interaction plays a crucial role in regulating both self-immunity and anti-tumor immunity [16].In small cell lung cancer, M1 macrophages are up-regulated in the CD209-High group.The activation of the CD209 signaling pathway is associated with increased infiltration of CD8 T cells, and the activation of the CD209 signaling pathway is also associated with increased neutrophil infiltration [17].In patients with hepatocellular carcinoma (HCC), the intra-tumoral expression of CD160 is decreased in NK cells, but not in CD8+ T cells [18].Knocking down KIR2DL4 in human NK cells in vitro can inhibit their cytotoxicity and also suppress the secretion of tumor necrosis factor α and interferon γ.Conversely, upregulation of KIR2DL4 can activate the MEK/ERK signaling pathway, which constitutes an activation pathway for NK cells [19].Human intestinal epithelial cells express BTNL3, which triggers a selective TCR-dependent response in human colonic Vγ4 cells.CEACAM1 acts as an allosteric ligand of TIM-3, essential for mediating T cell suppression.Conversely, upregulation of KIR2DL4 can activate the MEK/ERK signaling pathway, an activation pathway for NK cells.Immune checkpoint (ICG) therapy is an emerging cancer treatment method that modulates the immune system to suppress tumor growth.This form of treatment involves the use of medications to block tumor cells from evading immune recognition, enabling the immune system to identify and combat the tumor cells.The approach has been widely implemented across various cancer types and has demonstrated promising therapeutic effects.The underlying principle of immune checkpoint therapy is that the immune system is capable of recognizing and eliminating abnormal cells, including tumor cells, under normal circumstances.However, tumor cells often employ immune checkpoints to evade immune system attacks.Immune checkpoints represent a molecular signaling system that regulates immune system activity, preventing it from targeting normal tissues.Through interaction with these checkpoints on immune cells, tumor cells can evade immune recognition and subsequent attack.The key to immune checkpoint therapy lies in blocking these signals, thus reversing immune system suppression and reinstating its ability to target and attack tumor cells.An important advantage of immune checkpoint therapy is its lasting therapeutic effects.In comparison to conventional treatments such as radiotherapy and chemotherapy, this approach not only diminishes tumor volume but also triggers sustained immune responses against the tumor.This strategy can lead to the development of targeted therapeutic drugs for alterations in immune checkpoint expression in colon cancer, offering newfound hope and opportunities for numerous CC patients, alleviating their suffering.
Effective drug therapies for CC treatment are currently insufficient.Thus, there is an urgent need to investigate potential drugs for this purpose.Our study offers a fresh perspective by utilizing cMAP analysis to link CC-related pathogenic genes in the search for potential compounds for CC treatment.Through cMAP analysis, we have identified candidate drugs, including ISOX, vorinostat, NVP-AUY922, selumetinib, AS-703026, THM-I-94, NVP-TAE684, trichostatin-a, and scriptaid.It is worth noting that ISOX exhibits the highest negative enrichment scores in the cMap analysis, indicating its strong potential in reversing the expression of relevant pathogenic genes in CC.
Notably, in cMap analysis, ISOX shows the highest negative enrichment scores, suggesting a significant reversal of the expression of pathogenic genes in CC.ISOX, also known as CAY10603, is a selective inhibitor of histone deacetylase 6 (HDAC6) [20].ISOX significantly inhibits the survival of osteosarcoma cells in a dose-dependent manner.It also dose-dependently inhibits proliferation, colony formation, migration, and invasion.Further in vivo experiments using animal models demonstrate that ISOX treatment significantly suppresses tumor growth.Flow cytometry analysis indicates that ISOX treatment induces increased infiltration of CD8+ T cells into the tumor [21].ISOX inhibits HDAC6, leading to a significant reduction in c-Jun N-terminal kinase (JNK) and c-Jun phosphorylation, preceding its inhibitory effect on the growth of glioma cells.These effects are attributed to the HDAC6 inhibitor-induced inhibition of mitogen-activated protein kinase 7 (MKK7), which has been identified as crucial in JNK activation and carcinogenesis in glioma cells [22].Thus, it is speculated that early administration of ISOX in CC patients may inhibit the onset and advancement of the disease, consequently leading to a substantial extension of patients' lifespans.
Ferroptosis, an iron-dependent regulated cell death pathway induced by the toxic buildup of lipid peroxides on cell membranes, shows significant promise in cancer treatment.Our study revealed the critical involvement of CDKN2A in CC.Elevated CDKN2A expression in CC is notably linked to a poor prognosis (Figure 3).Furthermore, we observed a close association between CDKN2A and ferroptosis (Figure 9D).The CDKN2A gene encodes two proteins, p14 and p16.p14 determines cell fate by indirectly stabilizing p53, while p16 suppresses tumor formation by inhibiting CDK4/6 [23,24].The expression of the CDKN2A gene can result in cell cycle arrest at the G1 phase, leading to the inhibition of cell proliferation and the promotion of tumor cell apoptosis [25].Research has indicated that the loss of the CDKN2A gene alters the lipid composition of glioblastoma multiforme (GBM), rendering GBM cells sensitive to lipid peroxidation and ferroptosis.This loss also reduces the storage of oxidative polyunsaturated fatty acids (PUFAs) in lipid droplets.Furthermore, the loss of P16 alone is sufficient to make GBM cells sensitive to ferroptosis [26].We can consider developing targeted drugs against the CDKN2A gene, which could promote the development of cancer cells towards the ferroptosis pathway.This novel treatment approach has the potential to lead the way in revolutionizing the field of cancer treatment, bringing new hope and possibilities for patients.This cutting-edge research will not only drive innovation in treatment methods but also pave the way for the application of ferroptosis in cancer treatment, offering patients more possibilities and hope.

Data Collection
The mRNA expression profiles of 471 Colon Adenocarcinoma (COAD) and 41 normal tissues were obtained from the TCGA Genomic Data Commons (GDC) database (https: //portal.gdc.cancer.gov/,accessed on 17 December 2023).To retrieve and organize data, the R package TCGAbiolinks can be employed.However, detailed clinical and pathological information was available for only 424 colon cancer samples, and the comprehensive clinical characteristics of these patients are presented in Table 1.Patients with missing or incomplete follow-up data were excluded from the survival analysis.Subsequently, survival analysis for 424 COAD patients was further performed to investigate the relevant differentially expressed genes.DESeq2, edgeR, and limma are different R packages for downstream differential gene expression analysis.DESeq2 (differential expression analysis of RNA-Seq data) is based on a negative binomial distribution model.It considers differences between samples and the variability of gene expression using the Bayesian method.edgeR (Empirical Analysis of Digital Gene Expression Data in R) is also based on a negative binomial distribution model.It uses a Bayesian method to improve stability of estimates by adapting the estimation of within-group variability.Limma (linear models for microarray analysis) is based on a linear model and uses the Bayesian method to estimate differential variances for each gene.Utilizing three distinct R packages for the screening of differentially expressed genes is intended to improve the credibility and robustness of the findings, thereby facilitating more accurate subsequent analyses, including enrichment and survival analyses, thus yielding more precise results.Genes exhibiting a Log2-fold change ≥ 2 were categorized as differentially expressed genes (DEGs), with statistical significance in gene expression determined by a p value < 0.05.We classified the differential analysis results from the three major R packages into upregulated and downregulated genes and took the intersection.The genes from this intersection were used for subsequent analyses.

Functional Enrichment Analysis
We performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses to ascertain the biological functions of the differentially expressed genes (DEGs).Visualizations were conducted using R packages such as clusterProfiler and ggplot2, with a significance threshold set at p < 0.05, FDR < 0.01 (Table 2).

Survival Analysis
We performed survival analysis using two different methods, the log-rank test and single-factor Cox regression analysis, to select survival-related genes (log_rank_p < 0.05 and cox_results_p < 0.05).We conducted KM analysis and drew survival curves using the R packages survival and survminer.We then took the intersection of the selected survival-related genes and the previously identified DEGs to obtain 47 hub genes.We utilized the ggplot2 package to illustrate the KM plots of these 47 hub genes and visualized the expression levels of these genes in the TCGA-COAD dataset.The enrichment status of each entry in the GO analysis of these 47 hub genes has been summarized (Table 3).

PPI Network Construction
After inputting 47 hub genes into STRING (https://string-db.org/,accessed on 30 January 2024) (version 12.0), a protein-protein interaction (PPI) network was predicted.The minimum required interaction score was based on 0.15.Then, Cytoscape (version 3.10.0)was used to visualize the PPI network.We adjusted the node sizes in the PPI network based on their degrees, using log2FC to change node colors (red for upregulation and blue for downregulation), and employed the combined_score to regulate the thickness and color gradient of the edges.To identify crucial nodes in a biological network, the cytoHubba plugin can implement the degree algorithm to assess the connectivity of network nodes.This method aids in the identification of highly connected nodes that often possess significant influence within the network.Leveraging cytoHubba and the degree algorithm, we can gain vital insights into the structural and functional attributes of biological networks, thereby contributing to the discovery of potential biomarkers, drug targets, and diseaseassociated genes.

LASSO Regression Analysis
We performed LASSO (Least Absolute Shrinkage and Selection Operator) regression analysis on the 47 hub genes.We used the glmnet package to build a LASSO model.We then chose an appropriate lambda value to build the model, where the size of lambda determines the number of genes selected for the model.Two dashed lines indicate two special lambda values: lambda.minand lambda.1se.The lambda values between these two are considered suitable.The model built with lambda.1se is the simplest, with fewer genes used, while lambda.minhas slightly higher accuracy and uses a greater number of genes.We chose to use lambda.min to build the model.We used the pROC and ggplot2 packages to plot ROC curves for lambda.minand lambda.1se.The AUC value, ranging from 0 to 1, reflects the model's performance-closer to 1 indicates better performance.We then utilized the survminer, survival, and timeROC packages to plot time-ROC curves.We employed the ggrisk package to construct a linked three-panel plot of risk factors.

Immune Cell Infiltration Analysis
We utilized the IOBR package to perform immune infiltration analysis and employed the xCell method to calculate immune cell infiltration, thereby exploring the immune microenvironment of the disease [27].The results of immune infiltration were visualized using the ggplot2 package.We identified immune checkpoint genes through a literature review and studied their differential expression in TCGA-COAD.We then utilized ggplot2 and ggpubr for visualization.

Mutation Analysis
We used the R package TCGAmutations from GitHub to retrieve the data and visualized the data using the maftools package.

Connectivity Map (cMAP) Analysis
CMAP (https://clue.io) is a gene expression signature-based database that elucidates relationships between diseases, genes, and small molecule compounds [28][29][30][31].In this study, 150 upregulated and 150 downregulated differentially expressed genes (DEGs) from the TCGA-COAD dataset were separately incorporated into the cMAP online database to identify potential small molecule drugs for the treatment of colon cancer.Ultimately, nine compounds with an enrichment score less than −90 were identified.

Ferroptosis Analysis
Ferroptosis is an iron-dependent regulated cell death mechanism closely associated with cancer.The process of ferroptosis is complex.In order to investigate the relationship between colon cancer and ferroptosis, we downloaded the driver genes and suppressor genes from the FerrDB (http://www.zhounan.org/ferrdb/current/)database for analysis of the expression changes in these genes.We then used ggplot2 and ggpubr packages for visualization.

Conclusions
In summary, we conducted differential gene expression (DEGs) and enrichment analyses utilizing the TCGA-COAD dataset.We developed a survival-related gene risk model through LASSO regression to accurately predict the prognosis of CC patients.Furthermore, we investigated the immune microenvironment, predicted small molecule drugs, and explored ferroptosis in CC.These results can significantly contribute to the development of therapeutic drugs for CC.

Figure 1 .
Figure 1.DEGs identification.(A) The three volcano plots were generated using the TCGA-COAD dataset by the three major R packages (DESeq2, edgeR, and limma).In the plots, red represents the significantly upregulated genes, and blue represents the significantly downregulated genes.(B) The PCA analysis plot of the TCGA-COAD dataset and the Venn diagram of differentially expressed genes (DEGs) from the three major R packages.(C) Heatmap of DEGs in the TCGA-COAD dataset.

Figure 2 .
Figure 2. Gene set enrichment analysis.Performing an enriched analysis of common differentially expressed genes (DEGs) in terms of Gene Ontology (GO) for biological process (BP), cellular component (CC), and molecular function (MF), as well as Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathways.(A) GO-BP.(B) GO-CC.(C) GO-MF.(D) KEGG.

Figure 2 .
Figure 2. Gene set enrichment analysis.Performing an enriched analysis of common differentially expressed genes (DEGs) in terms of Gene Ontology (GO) for biological process (BP), cellular component (CC), and molecular function (MF), as well as Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathways.(A) GO-BP.(B) GO-CC.(C) GO-MF.(D) KEGG.

Figure 4 .
Figure 4. PPI network construction.(A) Protein-protein interaction (PPI) network of the 47 hub genes.(B) cytoHubba was performed to screen the top 15 genes based on previous PPI networks.

Figure 4 .
Figure 4. PPI network construction.(A) Protein-protein interaction (PPI) network of the 47 hub genes.(B) cytoHubba was performed to screen the top 15 genes based on previous PPI networks.

Figure 7 .
Figure 7. Mutation analysis.(A) Overview plot of gene mutations.(B) Mutation spectrum plot of genes selected by lasso.min.

Figure 8 .
Figure 8. Connectivity map (cMAP) analysis.(A) Heatmap of significant perturbation compounds.(B) The chemical structures of those nine compounds.

Figure 8 .
Figure 8. Connectivity map (cMAP) analysis.(A) Heatmap of significant perturbation compounds.(B) The chemical structures of those nine compounds.

Figure 9 .
Figure 9. Ferroptosis analysis.(A) Lollipop chart depicting the differential expression of ferroptosis driver genes.(B) Lollipop chart depicting the differential expression of ferroptosis suppressor genes.(C) Ferroptosis corresponding protein-protein interaction (PPI) network.(D) Visualization of the PPI network for the top 10 genes associated with ferroptosis.

Figure 9 .
Figure 9. Ferroptosis analysis.(A) Lollipop chart depicting the differential expression of ferroptosis driver genes.(B) Lollipop chart depicting the differential expression of ferroptosis suppressor genes.(C) Ferroptosis corresponding protein-protein interaction (PPI) network.(D) Visualization of the PPI network for the top 10 genes associated with ferroptosis.

Table 1 .
Clinical features in COAD patients.

Table 2 .
Important terms in GO and KEGG enrichment results.