Predicting Dihydropyrimidine Dehydrogenase Deficiency and Related 5-Fluorouracil Toxicity: Opportunities and Challenges of DPYD Exon Sequencing and the Role of Phenotyping Assays

Deficiency of dihydropyrimidine dehydrogenase (DPD), encoded by the DPYD gene, is associated with severe toxicity induced by the anti-cancer drug 5-Fluorouracil (5-FU). DPYD genotyping of four recommended polymorphisms is widely used to predict toxicity, yet their prediction power is limited. Increasing availability of next generation sequencing (NGS) will allow us to screen rare variants, predicting a larger fraction of DPD deficiencies. Genotype–phenotype correlations were investigated by performing DPYD exon sequencing in 94 patients assessed for DPD deficiency by the 5-FU degradation rate (5-FUDR) assay. Association of common variants with 5-FUDR was analyzed with the SNPStats software. Functional interpretation of rare variants was performed by in-silico analysis (using the HSF system and PredictSNP) and literature review. A total of 23 rare variants and 8 common variants were detected. Among common variants, a significant association was found between homozygosity for the rs72728438 (c.1974+75A>G) and decreased 5-FUDR. Haplotype analysis did not detect significant associations with 5-FUDR. Overall, in our sample cohort, NGS exon sequencing allowed us to explain 42.5% of the total DPD deficiencies. NGS sharply improves prediction of DPD deficiencies, yet a broader collection of genotype–phenotype association data is needed to enable the clinical use of sequencing data.


Introduction
The anti-cancer drugs fluoropyrimidines (FP), including the antimetabolite 5-fluorouracil (5-FU) and its prodrugs tegafur and capecitabine, are widely used to treat solid tumors, mainly colorectal cancers.
Severe toxicity (grade 3-4) including gastrointestinal reactions, myelosuppression, mucositis, nervous system toxicity, and cardiotoxicity, develops in up to 30% of patients and leads to death in about 1% of cases [1][2][3][4][5]. Considering the hundreds of thousands of cancer patients annually treated with FP [4,5], pre-emptive prediction and early recognition of severe toxicity represent key issues to save patients' lives. The biological mechanism underlying 5-FU toxicity is an impaired drug metabolism due to the deficient activity of the enzyme dihydropyrimidine dehydrogenase (DPD, encoded by the DPYD gene), which catabolizes more than 80% of the administered FP to the inactive metabolite fluorodihydrouracil (FDHU). DPD deficiency leads to increased 5-FU plasma concentration and has been recognized since the 1980s as a main tract of 5-FU-treated subjects undergoing severe adverse events [6][7][8][9], opening the way to the pre-emptive testing of DPD activity level (e.g., phenotypic assessment) to identify patients with high risk for toxicities. Two main analytical approaches to DPD phenotyping have been developed and successfully employed to improve FP safety: the determination of the uracil/dihydrouracil ratio in plasma, which estimates the DPD activity level by measurement of the endogenous DPD substrate uracil and its metabolite dihydrouracil, and the direct measurement of DPD enzymatic activity in peripheral blood monocular cells, by biochemical assays [10][11][12][13][14]. Unfortunately, such methodologies have limited diffusion in clinical laboratories, since they require peculiar equipment (such as liquid chromatography and mass spectrometry) and are usually based on homemade protocols [10][11][12][13][14].
Screening for the mentioned SNPs is recommended by several medicine agencies and international panels of experts, such as the Clinical Pharmacogenetics Implementation Consortium (CPIC) and the Dutch Pharmacogenetics Working Group (DPWG), which also developed specific guidelines for FP dose adjustment in carrier patients [27][28][29][30]. Even if DPYD genotyping achieved capillary diffusion in clinical diagnostic labs, it should be kept in mind that the population frequency of the screened SNPs is around 1-2%, whereas DPD deficiency is present in up to 5% of the general population [4,31]. Thus, a significant fraction of DPD deficiencies, caused by different, rare variations, is unpredictable by the current genotyping approach [31][32][33].
Presently, the growing cost-effectiveness of Next Generation Sequencing (NGS) technology and its growing availability in clinical diagnostic labs are enabling the screening of the entire DPYD coding region (or the full gene), allowing the detection of additional rare variants (mutations) [34] that may be causative of DPD impairment and 5-FU toxicity. However, to clearly establish the pathogenicity, and thus the clinical utility, of rare variants detected by NGS, novel genotype-phenotype correlations must be described and analyzed.
In this study, we performed DPYD exon sequencing (including intron/exon boundaries) in a cohort of 94 subjects who previously underwent DPD phenotyping by a biochemical assay, namely the 5-FU degradation rate (5-FUDR) [12]. DPD deficiencies are defined by 5-FUDR values below the fifth percentile of the values' distribution in the general population [32,33]. The study cohort was appositely selected to include most of the DPD deficiencies cases (N = 40) identified by previous phenotyping of about 1000 patients [32,33], with the aim to detect specific associations between rare or novel DPYD variants and decreased DPD activity.
DPYD sequencing detected 31 germline variants in the overall population, of which 23 were rare (observed minor allele frequency < 0.05%) and 8 were common (Table 1 and Figure 1). Thirteen variants were present in both PM and NM groups, 11 were detected exclusively in the PM group and 7 exclusively in the NM group. In the PM group, 8 variants were intronic and 16 exonic (13 missense, 2 synonymous and 1 frameshift); in the NM group, 11 variants were intronic and 9 exonic (six missense and three synonymous). A wild-type sequence was found in 3/40 (7.5%) PM subjects and 7/54 (12.96%) NM subjects. All DNA variations were in Hardy-Weinberg (HW) equilibrium, except the *13 SNP (rs55886062), which was detected only in the PM group and deviated by the HW equilibrium (p = 0.038). This result is consistent with the known association of the *13 allele with poor DPD activity [22][23][24].

Discussion
The phenotypic 5-FUDR assay was previously established as clinically useful to manage FP treatment. Furthermore, 5% of the general population has 5-FUDR values ≤ 0.85 ng/mL/10 6 cells/min and is classified as PM [32]. We have previously shown that PM subjects have a significantly increased risk of developing severe 5-FU toxicity, and correlated the presence of known DPYD polymorphisms with both 5-FU toxicity and low 5-FUDR values [32,33,[36][37][38][39]. Our previous results confirmed that, despite the enormous benefits in terms of treatment safety brought by the system-level genotyping of recommended SNPs, a large fraction of DPD deficiencies remains unpredictable [32,33]. Thus, the implementation of NGS to characterize larger DPYD regions is attractive and is becoming more and more actionable. However, broad DPYD sequencing will drastically increase the Haplotype analysis testing interactions among the seven common SNPs did not detect significant associations with the mean 5-FUDR.
The potential functional effect of rare variants (observed minor allele frequency < 0.05), was investigated in-silico using the Human Splicing Finder System (Genomnis, Marseille, France), to predict the impact of intronic variations on splicing, and PredictSNP [35], to evaluate the impact of missense variations. None of the overall detected intronic variants were predicted to affect splicing, except for c.234-138G>A and c.2300-39G>A, which were predicted to generate an alteration of the exonic splicing enhancer/exonic splicing silencer motifs ratio and to activate a cryptic splicing donor site, respectively. However, both of these variants were detected in NM subjects.
Regarding the missense variants, four were predicted to be deleterious (Y211C, K259E, P519S, G539R) and three non-deleterious (W475R, V515I, L785M). All the missense mutations predicted as deleterious were present only in PM subjects.

Discussion
The phenotypic 5-FUDR assay was previously established as clinically useful to manage FP treatment. Furthermore, 5% of the general population has 5-FUDR values ≤ 0.85 ng/mL/10 6 cells/min and is classified as PM [32]. We have previously shown that PM subjects have a significantly increased risk of developing severe 5-FU toxicity, and correlated the presence of known DPYD polymorphisms with both 5-FU toxicity and low 5-FUDR values [32,33,[36][37][38][39]. Our previous results confirmed that, despite the enormous benefits in terms of treatment safety brought by the system-level genotyping of recommended SNPs, a large fraction of DPD deficiencies remains unpredictable [32,33]. Thus, the implementation of NGS to characterize larger DPYD regions is attractive and is becoming more and more actionable. However, broad DPYD sequencing will drastically increase the number of reported variants, which will require functional interpretation to be applied to patient therapy management.
In order to highlight novel genotype-phenotype correlations and contribute to the functional assignment of DPYD genetic variants, we performed exon sequencing in a patient cohort enriched in DPD-deficient patients (5-FUDR PM group).
Among the eight common variants detected in the overall sample, we found a statistically significant association between the GG genotype in the polymorphic site c.1974+75A>G (rs72728438) and low 5-FUDR (Figure 1). This intronic variant has previously been associated with decreased DPD activity [40], and other studies described its presence in patients with low DPD activity, but the association did not achieve statistical significance [41]. Recently, a study of expression quantitative trait loci (eQTLs), i.e., genetic variants affecting gene transcription and transcript stability [42], found that rs72728443 is in high LD (r 2 > 0.94) with the intronic DPYD variant rs59353118, which is an eQTL significantly associated with reduced DPYD expression and with rs12022243 and rs72728443. This latter is located in an enhancer region and spans a p53 binding site. Thus, LD with distant causative polymorphisms may explain the association between intronic rs72728438 and the poor 5-FU metabolism observed in the present and previous reports [40,41].
Confirmation of impaired DPD activity in homozygous carriers of rs72728438 would be of paramount importance; considering that in our sample cohort, the GG genotype was present in 5/40 (12.5%) of PM subjects and no other no-function variants were detected in such subjects, the validation of this marker could drastically improve the genotype-based prediction of DPD deficiency.
Other DPYD variations previously reported as deleterious can explain 7.5% of the total DPD deficiencies (3/40 PM cases): c.2579delA (Q860fs, rs746991079), a frameshift variant resulting in protein truncation, previously isolated in individuals with DPD deficiency [43,44]; the Y211C allele, associated with consistent reduction of DPD activity (12.5-25% compared to wild-type) in an in-vitro assay using a recombinant mutant protein [34,45]; and the K259E allele, previously detected in a cohort of 5-FU treated patients undergoing toxicity [34]. One additional PM case (2.5%) could be imputed to the presence of a haplotype (rs1801160, rs1801265, rs2297595) that we previously found to be associated with significantly decreased 5-FUDR [39].
Considering the remaining 57.5% of PM cases, one subject carried the V515I variant, reported as deleterious by Hishinuma et al. [46] (35% DPD activity compared to the wild type) but as functional by Offer et al. (using a different in-vitro assay) [45] and predicted as non-deleterious by in-silico analysis; one subject carried the G539R variant, reported as functional by Offer et al. [45] but predicted as deleterious by in-silico analysis; one subject carried the novel W475R variant (no rsID available); and one subject carried the P519S variant (rs672601282), predicted as non-deleterious and deleterious, respectively, by in-silico analysis. The residual PM subjects had different combinations of known polymorphisms with no effect or uncertain effect on DPD activity or a wild-type sequence (N = 3).
Summing up the above observations, we can roughly compare the common DPYD genotyping strategy based on testing a few recommended genetic markers, with the diagnostic scenario opened by the NGS approach. In our sample cohort, pre-emptive genotypic screening limited to the recommended polymorphisms *2A, *13, HapB3 and D949V would have identified just 20% of DPD deficiencies, whereas exon sequencing allowed us to recognize an additional 22.5% of subjects carrying variants, providing a reasonable "warning" for DPD deficiency.
On the other hand, DPYD exon sequencing did not reveal a clear genetic determinant for more than a half of the analysed cases of DPD deficiency.
Plainly, sequencing of the full DPYD gene will allow us to detect deleterious genetic variations also in regulatory regions. Nevertheless, the concern of sequencing results interpretation should be solved: since most variants detected by sequencing are rare, no clear genotype-phenotype association data are available to support clinical decisions on 5-FU treatment. In-silico prediction and in-vitro expression/activity assays represent good strategies for rapid functional assessment of novel DPD variants, but, as exemplified in our study by the case of the G539R and V515I mutations, functional evaluation from in-silico prediction and in-vitro assay may be discordant, as well as results from in-vitro assays using different systems. Thus, genotype-phenotype association studies remain the main road to produce clinically useful data. It is expected that the increasing adoption of the NGS strategy for DPYD screening will expand the collection of data, enabling statistical analysis to recognize strong genotype-phenotype associations. In this scenario, we would highlight that in this type of study, the choice to study a "biochemical DPD phenotype" (e.g., a measure of the patient's DPD activity level) compared to a "clinical DPD phenotype" (e.g., measure of toxicity following 5-FU treatment) may be preliminarily advantageous. This is because the biochemical DPD phenotype can be measured in the general population despite the presence of cancer, allowing us to drastically increase the number of subjects screened for genotype-phenotype associations. However, we are aware that the clinical validation of a genetic marker identified by such an approach is essential, and that the lack of data about FP-induced toxicity in this study is an objective limit.

DPYD Exon Sequencing
Genomic DNA was isolated from 200 mL of EDTA-anticoagulated peripheral blood using the QiaSymphony automatic extractor with the QIAsymphony DSP DNA Mini Kit (Qiagen, Hilden, Germany). DPYD target regions including exons and intron/exon boundaries were amplified with an Ion AmpliSeq™ Library Kit 2.0 (ThermoFisher Scientific, Waltham, MA, USA) according to the manufacturer's instructions. Libraries were then diluted and subjected to templating and chip loading using the Ion Chef™ Instrument with the Ion 510™ and Ion 520™ and Ion 530™ Kit-Chef; NGS was then performed on the Ion S5 System and data were analyzed using the Ion Reporter Software version 5.18 (ThermoFisher Scientific, Waltham, MA, USA).

In-Silico Prediction of Variants' Effect
Functional consequences of rare genetic variations in the intronic regions (intron/exon boundaries) were analysed using the Human Splicing Finder System (Genomnis, Marseille, France), which evaluated their potential effects on all splicing signals including acceptor and donor sites, branch points and auxiliary splicing signals, such as exonic splicing enhancer/silencer (ESE/ESS).
Functional consequences of rare genetic variations in the exonic regions were analysed using the PredictSNP algorithm, which combines data from different well-established prediction tools to predict the impact of aminoacidic substitutions on protein function [35]. Rare variants were defined as variants with an observed minor allele frequency < 0.05.

Statistics
Numerical variables were expressed as mean ± standard deviation. Tests for deviation from the Hardy-Weinberg (HW) equilibrium, analysis of genotype and allele distributions and association analysis with the 5-FUDR values were performed using the SNPStats online tool [47,48]. Single SNP association with the response variable 5-FUDR was tested using linear regression under a dominant, recessive, co-dominant or log-additive model. Haplotype association with the response variable 5-FUDR was tested using linear regression under a log-additive model. All analyses were adjusted by age and sex. No correction for multiple testing was applied and P values less than 0.05 were considered significant. Single SNP analysis and haplotype analysis were only performed on common variants (N = 7, observed MAF > 0.05, Table 1).
The presence of LD among all variants was evaluated using the web-based application LDlink [49]. The software calculates D prime (D ) and R squared (R 2 ) statistics using data from the 1000 Genomes Project [50]. The LD analysis was performed in the European population.

Conclusions
The current DPYD genotyping approach to the identification of patients with high risk to develop severe 5-FU toxicity is limited to the screening of four recommended variants, detecting a minor fraction of actual DPD deficiencies. The advent of cost-effective NGS analysis will allow to detect a high number of rare DPYD variants and is expected to greatly improve the prediction power of genetic testing. Though, prerequisites for full implementation of DPYD NGS analysis in clinical diagnostics is the collection of further genotype-phenotype association studies to unambiguously define the functional impact of rare variants. In this scenario, we would highlight the key role of DPD phenotyping assays: DPYD sequencing in a specific target population identified as "DPD-deficient" by biochemical phenotyping, compared to clinical phenotyping (e.g., response to 5-FU treatment), is simpler and may accelerate "cases" enrollment, increasing the available sample size. Such preliminary identification of novel pharmacogenomics markers would in turn facilitate their clinical validation.

Informed Consent Statement:
Written informed consent has been obtained from the patients to publish this paper.

Data Availability Statement:
The authors confirm that the data supporting the findings of this study are available within the article. Raw sequencing data are available from the corresponding author on request.

Conflicts of Interest:
The authors declare no conflict of interest.