Uncovering Potential Therapeutic Targets in Colorectal Cancer by Deciphering Mutational Status and Expression of Druggable Oncogenes

Background: Numerous driver mutations have been identified in colorectal cancer (CRC), but their relevance to the development of targeted therapies remains elusive. The secondary effects of pathogenic driver mutations on downstream signaling pathways offer a potential approach for the identification of therapeutic targets. We aimed to identify differentially expressed genes as potential drug targets linked to driver mutations. Methods: Somatic mutations and the gene expression data of 582 CRC patients were utilized, incorporating the mutational status of 39,916 and the expression levels of 20,500 genes. To uncover candidate targets, the expression levels of various genes in wild-type and mutant cases for the most frequent disruptive mutations were compared with a Mann–Whitney test. A survival analysis was performed in 2100 patients with transcriptomic gene expression data. Up-regulated genes associated with worse survival were filtered for potentially actionable targets. The most significant hits were validated in an independent set of 171 CRC patients. Results: Altogether, 426 disruptive mutation-associated upregulated genes were identified. Among these, 95 were linked to worse recurrence-free survival (RFS). Based on the druggability filter, 37 potentially actionable targets were revealed. We selected seven genes and validated their expression in 171 patient specimens. The best independently validated combinations were DUSP4 (p = 2.6 × 10−12) in ACVR2A mutated (7.7%) patients; BMP4 (p = 1.6 × 10−04) in SOX9 mutated (8.1%) patients; TRIB2 (p = 1.35 × 10−14) in ACVR2A mutated patients; VSIG4 (p = 2.6 × 10−05) in ANK3 mutated (7.6%) patients, and DUSP4 (p = 7.1 × 10−04) in AMER1 mutated (8.2%) patients. Conclusions: The results uncovered potentially druggable genes in colorectal cancer. The identified mutations could enable future patient stratification for targeted therapy.


Introduction
Colorectal cancer (CRC) is a major cause of cancer-related deaths worldwide [1], with steeply increasing rates in some developed countries [2], especially among adults aged <50 years [3]. About poor survival outcome and filtered out actionable genes to provide clinically relevant targets for drug development. The expression of the top genes was validated in an independent clinical cohort.

Sequencing and Expression Database
Expression and mutation data were retrieved from the TCGA repository (https://portal.gdc.cancer.gov/). Mutations identified with the mutect2 algorithm were downloaded in VCF format. The variations were chosen depending on their mutect2 'PASS' status-mutations with 40x the overall coverage and a minimum of 5 reads were selected. The remaining mutations were annotated using the snpEff [19] program based on the GRCh38 human genome version. In the database construction, only the canonical isoforms were selected. The mutations were categorized as any, coding, non-coding and disruptive mutations, based on the impact on the protein sequence.
The disruptive mutations included those affecting the START or STOP codon, producing an early stop codon, affecting the splice-site (but not the splice region), or creating frame shifts (insertions and deletions affecting less than 3 nucleotides or exceeding the multiples of 3), as illustrated in Figure 1A [20]. The expression database was constructed from the raw HTseq-count read counts. The obtained read counts were normalized using the DESeq2 [21] algorithm.

Gene Selecting Algorithm
We split the specimens into two cohorts based on the mutation status. The normalized expression of the wild-type and mutant specimens was compared by a Mann-Whitney test. Only disruptive mutations were included, and we repeated the process for each selected gene. Statistical significance was accepted in case of p < 0.01 and a fold change (FC) cutoff at 1.44. The Benjamini-Hochberg false discovery rate was executed to safeguard against errors resulting from multiple hypothesis testing.

Survival Analysis
We searched PubMed Gene Expression Omnibus (GEO) data repository (https://www.ncbi. nlm.nih.gov/geo/) to identify colorectal cancer datasets with published survival times, as described previously [10]. The raw data were re-normalized, and the datasets were combined into a single database. To select the most reliable probe set for each gene, we used JetSet, a tool measuring and comparing different quality parameters for each probe set [22]. Survival analysis was performed for the selected genes using Cox proportional hazards regression. The analysis was performed using the auto-select best cutoff function, whereby all the possible cutoff values between the upper and lower quartiles were computed and the best performing threshold was used as a cutoff. Kaplan-Meier plots were drawn to visualize survival differences.

CRC Specimens for the Validation Cohort
For the clinical validation set up to identify top genes, we analyzed formalin-fixed paraffin-embedded (FFPE) cancer tissues from 171 CRC patients who underwent surgery at the Tokyo Medical and Dental University Hospital (TMDU), Tokyo, Japan from 2007 to 2011. All cases were histologically confirmed as adenocarcinoma and were classified into tumor-node-metastasis (TNM) stages after surgery according to the American Joint Committee on Cancer (AJCC), 7th edition. This study was conducted in accordance with the Declaration of Helsinki. All the procedures associated with this study were approved by the Institutional Review Boards of the Baylor Scott & White Research Institute (approval number: 003-180), and all the patients provided written informed consent.

Nucleic Acid Isolation and Gene Expression Analysis
The total RNA was isolated from the FFPE tissues using an ALL Prep DNA/RNA FFPE kit (Qiagen, Valencia, CA, USA). A synthesis of the complementary DNA (cDNA) was conducted with 500ng of the total RNA using the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Foster City, CA, USA). qRT-PCR was performed using the SensiFAST™ SYBR ® Lo-ROX Kit (Bioline, London, UK) on the Quantstudio 7 Real Time PCR System (Applied Biosystems). The relative expression levels of the target genes were normalized against beta-actin using the 2 −∆CT method. The PCR primers used in the current study are shown in Table S1.

Availability of Data and Material
Expression and mutation data were retrieved from the TCGA repository (https://portal.gdc.cancer. gov/).

Database Setup
Data from 459 patients diagnosed with colon adenocarcinomas (COAD) and 170 patients with rectal adenocarcinomas (READ) were available from the TCGA repository, including a small subset of patients diagnosed with mucinous adenocarcinomas (COAD: 13.5%; READ: 7.6%). Most COAD patients were diagnosed with a clinical stage II disease, while roughly an equal number of READ patients were diagnosed with stage II and III disease. Altogether 55.4% of COAD patients were  (Table S2). The balanced incorporation of tumors with early and more advanced disease stages avoids selection bias for more aggressive tumors.
Venous invasion was present in about one fifth of all patients, and over one third of the patients were diagnosed with lymph node involvement. Males were slightly overrepresented in both COAD and READ. Over two third of patients were 60 years of age or older, with a median age of 68 years in COAD, and 66 years in READ. Residual disease was absent after surgery in most patients (COAD: 72.1%; READ: 74.7%). During the median follow-up period (COAD: 22.3 months; READ: 20.58 months), 22.2% of COAD and 15.9% of READ patients died. Only a small proportion (2.4%) of COAD patients and none of the READ patients were identified with MSI-high disease, although information about MSI was lacking for most patients (for details see Table S2). We combined the COAD and READ datasets when comparing mutation status and gene expression levels to increase the statistical power of the analysis.

Database Setup
Data from 459 patients diagnosed with colon adenocarcinomas (COAD) and 170 patients with rectal adenocarcinomas (READ) were available from the TCGA repository, including a small subset of patients diagnosed with mucinous adenocarcinomas (COAD: 13.5%; READ: 7.6%). Most COAD patients were diagnosed with a clinical stage II disease, while roughly an equal number of READ patients were diagnosed with stage II and III disease. Altogether 55.4% of COAD patients were diagnosed in either stage I (16.6%) or stage II (38.8%), and about half (49.4%) of READ patients were diagnosed in stage I/II (stage I: 19.4%; stage II: 30.0%) ( Table S2). The balanced incorporation of tumors with early and more advanced disease stages avoids selection bias for more aggressive tumors.
Venous invasion was present in about one fifth of all patients, and over one third of the patients were diagnosed with lymph node involvement. Males were slightly overrepresented in both COAD and READ. Over two third of patients were 60 years of age or older, with a median age of 68 years in COAD, and 66 years in READ. Residual disease was absent after surgery in most patients (COAD: 72.1%; READ: 74.7%). During the median follow-up period (COAD: 22.3 months; READ: 20.58 months), 22.2% of COAD and 15.9% of READ patients died. Only a small proportion (2.4%) of COAD patients and none of the READ patients were identified with MSI-high disease, although information about MSI was lacking for most patients (for details see Table S2). We combined the COAD and READ datasets when comparing mutation status and gene expression levels to increase the statistical power of the analysis.

Frequently Mutated Genes
Altogether, 53,373 disruptive mutations were identified in the sample population affecting 14,021 out of 39,916 coding and non-coding genes, as annotated by snpEff (http://snpeff.sourceforge.net/index.html) [19]. The mutations displayed a skewed distribution: most mutations appeared in very few cases (1-2 specimens), while only 10 genes (excluding TTN) were mutated in more than 40 patients. The 40 most frequently mutated genes are listed in Table S3. We selected the top ten (excluding TTN) most frequently mutated genes (APC, TP53, SYNE1, AMER1, SOX9, ACVR2A, ANK3, ARID1A, FBXW7, MDN1) ( Figure 2) for subsequent analysis.  The presence of selected disruptive mutations was not linked to patient sex in the entire dataset, nor to tumor stage in READ, although in COAD, two mutations remained differently distributed across tumor stages after multiple testing corrections: the ACVR2A mutations were linked to earlier tumor stages (stages I/II, p = 0.007) while the APC mutations were associated with an advanced disease (stage IV, p = 0.008). The locations of the disruptive mutations for the select genes are illustrated in Figure 1B. The raw data of Figures 1B and 2 are presented in Tables S4 and S5. The expression levels of 20,500 genes were investigated in our patient population for correlation to each mutation.

Survival Analysis
For each mutation, the top 50 upregulated genes were selected for subsequent survival analysis, and all the genes were included when the number of upregulated genes was less than 50 (5 genes for TP53 and 21 for AMER1). Out of the 426 upregulated genes, 95 were associated with worse outcome (Table S6), while 177 overexpressed genes correlated with better relapse-free survival (p < 0.05). For 117 genes, the association between gene expression and clinical outcome was not significant, and 38 genes could not be identified in the transcriptomic dataset.

Selection of Potentially Actionable Genes
For potentially actionable genes, the Drug-Gene Interaction database DGIdb 3.0 [23] was analyzed. The database aggregates druggable gene categories from multiple sources and predicts druggability by various methods. Therefore, potentially actionable genes may or may not have existing drugs that target them. Out of the 95 upregulated genes linked to worst survival, 37 were classified as theoretically druggable (Table S6). We selected seven genes (TRIB2, TRIM7, DUSP4, PKM2, VSIG4, BMP4 and IL1RN) based on MW p-values, expression differences, and effects on survival (Figures S1 and S2) for subsequent clinical validation. Out of the gene list, two upregulated genes (TRIM7, DUSP4) were linked to multiple mutations. The selection process of overexpressed genes linked to certain mutations is illustrated in Figure 3 and the differential expression for the top genes is presented in Figure 4.

Validation Sample Set
Altogether, 171 patient specimens were available for subsequent clinical validation of the selected genes. The majority of the patients were diagnosed with stage II and III disease, with 45 mm median tumor size. The median age at diagnosis was 67 years, although two third of the patient population was at least 60 years old, with slight overrepresentation of males. The majority of patients were diagnosed with the presence of venous invasion, and more than half with lymphatic invasion. The median recurrence-free survival (RFS) was 50 months, and 26.9% of patients recurred during follow up. The median overall survival (OS) was 60 months, and altogether, 19.9% of patients succumbed to the disease. Only 6.3% of patients were identified with microsatellite instability (MSI)-high disease, although information was unavailable for more than half of the patients. The characteristics of the validation dataset are summarized in Table 1.

Genes Associated with RFS in a Clinical Validation Set
The expression of the selected potentially actionable genes was validated with qPCR ( Figure S3). High expression of TRIB2 (p = 0.0003), VSIG4 (p = 0.0003), BMP4 (p = 0.023) and DUSP4 (p = 0.043) was associated with worse RFS in the validation dataset ( Figure 5), although the expression of TRIM7, PKM2 and IL1RN was not significantly different between patients with or without disease recurrence (p > 0.05). We used the median as a cutoff in the clinical validation cohort. We also analyzed RFS using Youden Index as a cutoff value (please see Figure S4). The genes associated with survival disadvantage were linked to mutations in ACVR2A (TRIB2, DUSP4), ANK3 (VSIG4, DUSP4), SOX9 (BMP4) and AMER1 (DUSP4) (Figure 4).

Discussion
Despite the well-described driver genetic alterations in CRC, the availability of targeted therapies is limited, and the development of new personalized treatment methods has been rather slow. Mutations may impact downstream signaling pathways and alter the expression of potentially actionable genes. Evidence suggests that secondary genetic alterations may have a strong prognostic

Discussion
Despite the well-described driver genetic alterations in CRC, the availability of targeted therapies is limited, and the development of new personalized treatment methods has been rather slow. Mutations may impact downstream signaling pathways and alter the expression of potentially actionable genes. Evidence suggests that secondary genetic alterations may have a strong prognostic relevance [18], which provides an opportunity to utilize them in medical target selection. Following this approach, we identified four genes: TRIB2, VSIG4, BMP4, and DUSP4. These genes were overexpressed in CRC specimens with disruptive mutations, were linked to worse survival in multiple datasets and are deemed potentially actionable. The mutations affecting ACVR2A, ANK3, SOX9 and AMER1 may be utilized as biomarkers for patient stratification.
TRIB2 (Tribbles Pseudokinase 2) belongs to the Tribbles family of proteins that are structurally similar to kinases but lack catalytic activity. Growing evidence suggests the role of TRIB2 as a pro-oncogene with a regulatory function underlying drug resistance both in solid and hematological malignancies [24,25]. In concordance with our findings, elevated TRIB2 expression was linked to poor prognosis in CRC, with higher expression in patients with frequent disease recurrence [26]. Consistently, ectopic or intrinsic high expression of TRIB2 in CRC confers resistance to chemotherapy by activating AKT [27]. TRIB2 down-regulation inhibits proliferation, induces cell-cycle arrest and senescence in CRC cell lines [26]. TRIB2 is a potential novel therapeutic target both for acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) [28]. Recent drug repurposing approaches identified the small-molecule protein kinase inhibitors afatinib and neratinib as active substances in AML cell lines that rapidly degrade TRIB2, eliciting survival benefits [29]. Afatinib has already been investigated both as a monotherapy and in combination regimens in CRC patients, but with limited antitumor activity [30], although biomarker-driven patient recruitment may increase therapy effectiveness. In our dataset, elevated TRIB2 expression was specifically associated with the presence of ACVR2A mutations. ACVR2A (Activin A Receptor Type 2A) is one of the most frequently mutated genes in hypermutated CRC [31]. ACVR2A encodes a transmembrane receptor, which is a member of the TGF-beta superfamily of receptors with multiple biological functions, including the regulation of proliferation and cell migration [32]. ACVR2A down-regulation has been linked to advanced stage, lymphatic invasion and reduced survival, suggesting its prognostic significance in CRC [31]. Since ACVR2A mutations are most frequent in hypermutated CRCs, TRIB2 may be most suitable as a therapeutic biomarker within this population, although the low proportion of patients with known MSI-high disease limits conclusions for MSI-specific biomarkers.
VSIG4 (V-Set and Immunoglobulin Domain Containing 4) is a newly identified immune checkpoint of the B7 family of immune regulatory proteins with similar functions as CTLA-4 and PD-1 in T-cell inhibition. Surface expression of VSIG4 protein is restricted to resting macrophages [33]. Its expression in tumor-associated macrophages is a potential facilitator of NSCLC development by inhibiting CD4+ and CD8+ T-cell proliferation and cytokine production [34]. In the current study, elevated VSIG4 expression was associated with the presence of disruptive ANK3 mutations and recurrent disease. ANK3 (Ankyrin G) provides cellular stability by anchoring cytoskeleton to the plasma membrane, and its down-regulation has been associated with poor prognosis [35] and increased invasive potential [36] in solid malignancies. Potential immune checkpoint inhibitor therapy may be most relevant for patients with upregulated VSIG4, especially those with simultaneous ANK3 mutations.
Bone morphogenetic proteins (BMPs) have received much attention due to their role in tumor development and dissemination, and the wealth of conflicting studies indicates that BMPs may stimulate tumorigenesis in one cancer type but suppress it in another [37]. The BMP4 (Bone Morphogenetic Protein 4) gene encodes a secreted ligand of the TGF-beta superfamily of proteins, resulting in the activation of SMAD family transcription factors [38]. BMP4 is overexpressed in CRC compared to normal tissue [39], and BMP4-overexpressing clones created from HCT116 cells exhibit enhanced migration and invasion [40]. In our data, BMP4 upregulation was associated with the presence of SOX9 mutations. SOX9 (Sex-determining region Y (SRY)-box 9) is an intensely studied transcriptional factor, which plays critical roles in the development and lineage commitment of various tissues [41]. In the intestinal epithelium, it is mainly expressed in the progenitor cells of the colon and small intestines and plays a crucial role in stem-cell maintenance. It is also a downstream target of Wnt/β-catenin signaling, with possible roles in β-catenin regulation [42]. There is growing evidence of the importance of SOX9 in CRC development. SOX9 is mutated in 5-10% of all CRC cases [31] and leads to SOX9 overexpression [43]; this association is also confirmed in our dataset. It is likely a pro-oncogene, as high SOX9 expression has been linked to advanced tumor stage [44] and adverse prognosis in primary CRC [45]. BMP signaling and SOX9 interact in various developmental processes, such as chondrogenesis and the regulation of stem cells [46]. Targeting BMP4 may thus provide a viable therapeutic option, especially for patients with mesenchymal subtypes of CRC.
DUSP4 (Dual Specificity Phosphatase 4) is a negative regulator of mitogenic signal transduction associated with proliferation and differentiation by dephosphorylating members of the MAPK superfamily (ERK1, ERK2, JNK) [47]. Higher DUSP4 expression characterizes MSI-high cell lines compared to microsatellite stable cells associated with increased cellular proliferation [48]. Consistently, in our dataset, significant DUSP4 upregulation characterized tumors with various (ACVR2A, AKN3 and AMER1) mutations, with the highest expression in ACVR2A mutant cases. Both ACVR2A and ANK3 mutations have been mainly observed in hypermutated/MSI-high tumors [31,49], however the lack of information about MSI status for the majority of specimens does not allow conclusions for MSI-specific biomarkers. AMER1 (APC Membrane Recruitment Protein 1) is a tumor suppressor mutated in about 10% of CRC patients, especially in tumors with mesenchymal phenotypes linked to canonical Wnt-pathway inhibition [50]. Tumors lacking AMER1 mainly belong to the EMT subtype associated with poor prognosis and resistance to chemotherapy [50]. Thus, targeting DUSP4 may provide a treatment approach for a wider selection of CRC patients with distinct molecular subtypes.

Conclusions
The evolution of precision medicine in CRC remains slow, with a limited efficacy of single-mutation/single-drug approaches [11]. Nevertheless, linking the mutation status of oncogenes with targetable opportunities in downstream signaling pathways provides a novel approach for therapy development. The proposed biomarkers represent divergent biological mechanisms and may be particularly suitable for distinct molecular subtypes of CRC. For example, ACVR2A mutations are most frequent in hypermutated CRC and therefore, targeting up-regulated TRIB2 may potentially be suitable in such tumors, while BMP4 may be a better suited biomarker for CRC patients with SOX9 mutations and mesenchymal subtypes. In contrast, multiple mutations result in DUSP4 upregulation, providing a potential treatment for a larger selection of patients. The suggested biomarker-defined patient stratification may provide an attractive approach toward therapy optimization.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/11/7/983/s1: Figure S1: Sensitivity and specificity of the seven selected potentially actionable genes associated with frequent disruptive mutations by ROC analysis, Figure S2: High expression of the selected seven potentially actionable genes from the training dataset is linked to worse relapse-free survival based on Cox proportional hazard regressions at p < 0.05, Figure S3: Box plots indicating expression of the potentially actionable genes in the validation cohort, Figure S4: Recurrence-free survival of the clinical validation cohort, stratifying the population into high and low risk using the Youden index as a cutoff, Table S1: Primer sequences used in the qPCR validation, Table S2: Clinical characterization of colon and rectal cancer patient populations from the training datasets, Table  S3: Top 40 most frequently mutated genes (disruptive mutations exclusively) in the CRC patient population (n = 582), Table S4: Mutations in the training dataset, Table S5: Raw data including stage, sex and mutations in the training dataset, Table S6: List of upregulated genes associated with worse clinical outcome (RFS, p < 0.05).