Genetic Variants as Predictors of the Success of Colorectal Cancer Treatments

Simple Summary Some colorectal cancer (CRC) outcomes are partially associated with genetics, and different studies have proposed several genetic variants as predictors. However, analysis of their performance in other populations is limited. Thus, our objectives were to assess their use in our cohort and to find additional genetic variants associated with CRC outcomes. We found that some of the genetic variants proposed as predictors could be used in our cohort, although the addition of clinical data improved the performance. In addition, we found additional genetic variants that could be useful to predict the CRC manifestations in our population. Our findings will help to refine the use of genetic polymorphisms to predict CRC outcomes in our population, and we expect that our findings could be useful for other populations. Abstract Background: Some genetic polymorphisms (SNPs) have been proposed as predictors for different colorectal cancer (CRC) outcomes. This work aims to assess their performance in our cohort and find new SNPs associated with them. Methods: A total of 833 CRC cases were analyzed for seven outcomes, including the use of chemotherapy, and stratified by tumor location and stage. The performance of 63 SNPs was assessed using a generalized linear model and area under the receiver operating characteristic curve, and local SNPs were detected using logistic regressions. Results: In total 26 of the SNPs showed an AUC > 0.6 and a significant association (p < 0.05) with one or more outcomes. However, clinical variables outperformed some of them, and the combination of genetic and clinical data showed better performance. In addition, 49 suggestive (p < 5 × 10−6) SNPs associated with one or more CRC outcomes were detected, and those SNPs were located at or near genes involved in biological mechanisms associated with CRC. Conclusions: Some SNPs with clinical data can be used in our population as predictors of some CRC outcomes, and the local SNPs detected in our study could be feasible markers that need further validation as predictors.


Introduction
Colorectal cancer (CRC) is the second most diagnosed cancer and the second cause of death among cancers, accounting for 10% of diagnosed cancers in developed countries [1].Its risk is influenced by the environment, genetics, and microbial composition and can be sporadic or result from inflammatory processes [2][3][4].Therefore, CRC is a significant public health issue, and strategies must be developed to predict the prognosis and adjust the treatment [5,6].
Different treatments are used in CRC, such as surgery or the use of chemotherapy.In the last few years, various drugs have been developed (e.g., 5-fluorouracil or capecitabine) that can be used alone or in combination to treat CRC.Among the different factors that could determine the success of those treatments, it has been observed that some genetic polymorphisms (SNPs) can affect success.SNPs of several candidate genes related to the biological mechanisms of the treatment have been analyzed to test their role in the success of the treatments: survival in FOLFIRI-based treatment [7], survival in Bevacizumab-based treatment [8], and toxicity to 5-fluorouracil and capecitabine [9][10][11][12][13][14][15][16][17][18][19].In addition, genomewide association analyses have been used to find SNPs associated with metastatic CRC survival in treatment with chemotherapy plus biologics [20], survival in rectal cancer [21], progression-free survival in metastatic CRC in different treatments [22], and survival in CRC [23].However, there are discrepancies between studies, possibly due to the differences in the frequency of the risk variants between populations [6].It has been proposed that SNPs related to toxicity could be associated with the efficacy of the treatment [24].
Previously, we analyzed CRC patients from a Basque cohort to study the performance of the available genetic information to assess the risk of developing CRC.In that study, we showed that the available genetic information could be used.Still, there were local genetic variants that could be relevant to the genetic architecture of CRC in our cohort [25].
Thus, our aim with this study is to assess if the polymorphisms previously associated with the success of the treatment in CRC are valid predictors in our cohort and to explore possible local genetic variants that could be predictors of the success of the treatment.

Recruitment
CRC cases were diagnosed using standard criteria, and the samples used in this study were obtained in standard clinical practice after signing an informed consent letter at Hospital Universitario Donostia (San Sebastian, Spain).In total, 869 cases were recruited.The present study was approved by the Local Ethics Committee (Comité de Ética de la Investigación con medicamentos de Euskadi, code: PI+CES-BIOEF 2017-10).

Genotyping
The genotyping of the DNA samples analyzed in this work was carried out using the Illumina Global Screening Array through the Illumina iScan ((llumina, San Diego, CA, USA) high-throughput screening system at the Institute of Clinical Molecular Biology (Kiel, Germany).Illumina GenomeStudio software (v2.0) and its GenCall algorithm were used to transform raw intensities into alleles.
Quality control of the called genotypes and samples was carried out using the following filters: The exclusion of samples with ≥5% missing rates; markers with non-called alleles; markers with missing call rates > 0.05; related samples (PI-HAT > 0.1875); samples whose genotyped sex could not be determined; and samples with a high heterozygosity rate (more than three times the SD from the mean).In addition, autosomal SNPs were kept, and markers with Hardy-Weinberg equilibrium p < 1 × 10 −5 were removed.Finally, principal component analysis was used to identify outlier samples (deviation of more than six times the interquartile range) through FlashPCA (v2.0) [26].
The Sanger Imputation Service was used to impute additional SNPs.For that, release 1.1 of the Haplotype Reference Consortium was used as a reference panel, and the EA-GLE2+PBWT pipeline was used to carry out the imputation [27][28][29].The imputed variants were filtered using the following criteria: variants with an INFO score < 0.80, a MAF score < 0.01, and non-biallelic markers were removed.
After the QC of the imputed data, 5,399,981 SNPs from 833 cases were kept.

Analyses
We analyzed seven outcomes (1-year survival, 3-year survival, 5-year survival, 5years without relapse, 5-years without relapse in patients treated with 5-fluorouracil-based chemotherapy, 5-years without relapse in patients treated with capecitabine, and 5-years without relapse in patients without chemotherapy) and the use of chemotherapy.In addition, we analyzed separately the stage of CRC (I+II and III+IV) and location (right colon, left colon, and rectum) for the same treatments and outcomes.
For each analysis, we analyzed the performance of 63 SNPs previously associated with CRC outcomes (Supplementary Table S1).We retrieved those SNPs from the GWAS Catalog [30], specifically from the studies GCST011584 [20], GCST002820, GCST002821 [21], GCST003057, GCST003058 [22], and GCST003229, GCST003230, and GCST003231 [23].In addition, SNPs associated with survival in FOLFIRI-based treatment [7], survival in Bevacizumab-based treatment [8], and toxicity to 5-fluorouracil and capecitabine [9][10][11][12][13][14][15][16][17][18][19] were analyzed.We used a generalized linear model to test if the carriership of the tested allele affected a given outcome.We used the area under the curve (AUC) of the receiver operating characteristic curve to measure the performance.Three AUCs were calculated: using only the carriership of the tested allele as a predictor; using sex, age, and tumor stage as predictors; and using the carriership of the tested allele, sex, age, tumor stage, and the first four principal components of the genetic distance of individuals as predictors.Those analyses were carried out using the R language [31] and the package pROC [32].
Moreover, for each outcome, a genome-wide association analysis was performed using logistic regression implemented in Plink [33], adjusting by sex, age, and the first four principal components of the genetic distance of individuals, stage, and location.In the case of the analyses of outcomes by stage, the analyses were adjusted by sex, age, location, and the first four principal components of the genetic distance of individuals; and in the case of the analyses of outcomes by location, sex, age, stage, and the first four principal components of the genetic distance of individuals.

Results
The demographic and clinical characteristics of each outcome we have analyzed are shown in Table 1.On the whole, there were significant differences in age, stage, location, lymph, and metastasis in each outcome but not in sex or histologic grade (Table 1).

Performance of Genetic Variants Previously Associated with CRC Outcomes
From the 63 SNPs previously associated with various CRC outcomes (Supplementary Table S1), 26 of them showed an AUC > 0.6 and a significant association (p < 0.05) with one or more outcomes analyzed in the present work (Table 2).In addition, the AUC of the SNPs was improved with the inclusion of additional variables (sex, age, and genetic distance).4.5 (±1.9) 3.8 (±1.9) 4.3 (±1.9) 3.8 (±1.9) 4.2 (±1.9) 3.8 (±1.9) 3.9 (±2.1) 3.8 (±1.9) 3.8 (±1.9) 3.9 (±1.9) 3.9 (±2) 3.6 (±1.7) 3.9 (±1.9) 3.6 (±2.Table 2. Performance of SNPs previously associated with CRC outcomes.A1, tested allele; Carriers, % of the carriers of the tested allele that showed the outcome; non-carriers, % of the Non-carriers of the tested allele that showed the outcome; OR (95% CI), odds ratio, and 95% confidence interval Direction: if the direction of the effect is the same (Same) or different (Diff) than the study where the SNP was described, "-" for non-data; p, p-value of the generalized linear regression; AUC SNP, AUC, and 95% of the confidence interval of the carriership of the tested allele as predictors; AUC clinical, AUC, and 95% of the confidence interval of sex, age, and stage as predictors; AUC Full, AUC, and 95% of the confidence interval of the carriership of the tested allele, sex, age, stage, and the first four principal components of genetic distance of individuals as predictors.Only SNPs with significant values and an AUC > 0.6 are shown.The most significant SNP was rs13180087 T > C (Table 2), whose minor allele was more prevalent in patients with left colon tumors without chemotherapy and who relapsed after 5 years (OR = 17, p = 3 × 10 −4 ).It was followed by rs17057166 C > T, whose minor allele was more prevalent in patients with I+II stage tumors and did not survive 1 year (OR = 4.3, p = 8 × 10 −4 ) (Table 2).
Regarding the best performance, rs1555895 A > G had an AUC of 0.7 to differentiate patients with rectal cancer that could have no 5-year relapse when they have not been treated with chemotherapy (Table 2).However, the AUC calculated with clinal variables (sex, age, and stage) outperformed the AUC using only the genetic variant (AUC = 0.76).In fact, in the majority of the cases, the clinical variables were more informative than only the SNP, except for rs11246159 T > C in the 5-year relapse of patients with I+II tumors and rs885036 A > G in the 5-year relapse of patients with left colon tumors treated with capecitabine (Table 2).In addition, when genetic data and clinical data are combined, the AUC outperformed the AUC values separately, reaching high values such as rs17048372 G > T and rs1801265 A > G in the 5-year relapse of patients with right colon tumors treated with 5-fluorouracil (AUC = 1) (Table 2).
Moreover, some SNPs were associated with outcomes other than those previously associated with them (Table 2).For example, rs17057166 C > T or rs3781663 G > A were associated with survival in rectal cancer, and our cohort was associated with survival in right or left cancer.The SNP rs1128503 A > G, which is associated with the toxicity of capecitabine, was associated with the success of using 5-fluorouracil.In addition, the effect of some genetic variants was not the same as in the study they were described (e.g., rs1573948 T > C, rs3781663 G > A, or rs1555895 A > G), or depending on the outcome, the effect was different (e.g., rs11246159 T > C, rs885036 A > G, or rs7299460 C > T).

Discovery of Local Genetic Variants Associated with CRC Outcomes
Apart from analyzing the performance of SNPs described in the literature, we searched for SNPs associated with CRC outcomes in our cohort.We did not find any genome-wide significant (p < 5 × 10 −8 ) SNPs, and we found 49 suggestive (p < 5 × 10 −6 ) loci associated with one or more CRC outcomes (Table 3).
The most significant SNP was rs10845123 G > A, associated with 5-year survival (OR = 2.9, p = 9.6 × 10 −9 ) and located in the KLRK1-AS1 gene.The next more significant SNPs were rs6889868 T > C, which was associated with 3-year survival (OR = 3.2, p = 6.4 × 10 −7 ) and located in the intergenic region; rs61991339 T > C, which was associated with 1-year survival (OR = 4.2, p = 7.7 × 10 −7 ) and located in the UNC79 gene; and rs6088387 G > T, which was associated with the use of chemotherapy (OR = 6.5, p = 7.9 × 10 −7 ) and located in the RALY gene.
Finally, the suggestive SNPs associated with various CRC outcomes were located in introns of genes, upstream or downstream of genes, or intergenic regions (Table 3).However, rs17821546 A > G, associated with 3-year survival in patients with I+II stage tumors (OR = 21.9, p = 2.2 × 10 −6 ), is located in the 3 UTR region of the SULT1C2 gene.

Discussion
In this study, we have analyzed the performance as predictors of known SNPs associated with CRC outcomes in our cohort, as well as searched for new SNPs that could be used as predictors of CRC outcomes in our cohort.
We are aware that the sample size, especially for some outcomes, was limited.This limitation means that the most relevant effects (e.g., high ORs) are detected as significant or that sampling biases may be generated.Therefore, in the analyses of previously known genetic markers, the p-values should be interpreted in that context.In addition, significant signals at the genome-wide level (p < 5 × 10 −8 ) are not found, probably due to the sample size, although suggestive signals (p < 5 × 10 −6 ) could be detected.Therefore, the present study's results should be validated, and follow-up analyses are needed in a larger cohort.However, considering our previous findings on the risk of CRC in this cohort [25] and the possible use of genetic variants to tailor treatments [6], we thought that this study could be a first step for our population to find appropriate genetic markers to predict CRC outcomes.In addition, in our previous studies of our population [25,34], we have detected local genetic variants that could be informative but that could not be detected in broader cohorts.Moreover, we are aware that the genetic particularities of our population due to its evolutionary history affect the generalization of the results obtained in this work.The isolation and the genetic drift have caused the frequencies of the alleles of the Basque population to be more similar to populations that lived in Europe in the Neolithic [35] or Iron Age [36] than modern European populations, whichd were impacted by migrations associated with Steppe pastoralism.Therefore, the SNPs that could be useful in our population could not be relevant for other populations, as it has been proposed previously to explain the differences in the results between populations [6].Although a limitation, this observation could highlight the importance of analyzing local populations to assess the utility of known genetic markers and to find local genetic markers.
The genetic variants previously associated with CRC outcomes have variable performance.Some of them had a good performance and, therefore, can be used to predict some outcomes.In addition, some SNPs helped predict different outcomes.It has to be highlighted that the performance would improve if additional variables were included.Thus, more genetic information is needed to make a good prediction, and other clinical data must be used for a robust prediction.
Moreover, we have detected genetic variants that could be useful in predicting CRC outcomes in our cohort.The most significant signal was detected in an SNP (rs10845123) associated with 5-year survival and located in the KLRK1-AS1 lncRNA.This lncRNA encodes a polypeptide regulated by TP53, which is involved in cell proliferation through its regulation of the cell cycle in DNA damage response [37].Another significant signal was the SNP rs6088387, whose minor allele is associated with the risk of being treated with chemotherapy.This SNP is located in the RALY gene, a gene associated with CRC aggressiveness, and its expression is associated with a poor prognosis in CRC [38].
Other SNPs related to various outcomes were located in genes previously associated with CRC.For example, it has been detected that there is a higher expression of GBP3 in CRC, although it is not a good predictor of response to immune checkpoint blockade [39].The expression of TSPAN11 has been associated with a stemness score and a stromal score of tumors in CRC [40], and it has been included in a model for prognosis prediction in CRC through its role in cell invasion [41].In the case of PTPRM, it has been suggested that it may play a role in colorectal tumorigenesis since it regulates cell growth, and its loss promotes the growth of oncogenic cells [42].In the case of other genes, their role in other cancers has been proposed.For example, the overexpression of SULT1C2 has been associated with the growth, survival, migration, and invasiveness of hepatocellular carcinoma cells [43].The expression of PUS1 is associated with overall survival in hepatocellular carcinoma [44], and it has been described that RBFOX3 plays a role in the chemosensitivity to 5-Fluorouracil in hepatocellular carcinoma [45].
On the whole, these results suggest that the genetic variants detected in our cohort could be feasible candidates to assist in the prediction of the outcome since the genes where they are located are associated with various biological mechanisms of CRC or other cancers.
It has to be pointed out that some of the SNPs detected, both in the analysis of SNPs previously associated with CRC outcomes and in the analysis of local genetic variants, were significant only in a specific stage or location (e.g., rs13180087 T > C or rs17057166 C > T), or the analyses of all patients altogether were driven by a specific stage or location (e.g., rs1347485 A > G or rs4712605 A > G).In the case of the risk of CRC, it has been observed that the genetic background is different depending on the location [25,46].Thus, the use of those SNPs should consider the stage and location of the tumor to make an accurate prediction about a given outcome.

Conclusions
In conclusion, we have found that 26 genetic markers previously associated with CRC outcomes could be good predictors in our population and that the accuracy of the precision was improved using clinical data.In addition, we detected 49 local genetic variants that could be feasible markers for several CRC outcomes; however, considering our limited sample size, further validation is needed to assess their utility as predictors.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers15194688/s1,Supplementary Table S1: SNPs previously associated with CRC outcomes analyzed in this work.Informed Consent Statement: Informed consent was obtained from all subjects involved in this study.