Added Value of Clinical Sequencing: WGS-Based Profiling of Pharmacogenes.

Although several pharmacogenetic (PGx) predispositions affecting drug efficacy and safety are well established, drug selection and dosing as well as clinical trials are often performed in a non-pharmacogenetically-stratified manner, ultimately burdening healthcare systems. Pre-emptive PGx testing offers a solution which is often performed using microarrays or targeted gene panels, testing for common/known PGx variants. However, as an added value, whole-genome sequencing (WGS) could detect not only disease-causing but also pharmacogenetically-relevant variants in a single assay. Here, we present our WGS-based pipeline that extends the genetic testing of Mendelian diseases with PGx profiling, enabling the detection of rare/novel PGx variants as well. From our in-house WGS (PCR-free 60× PE150) data of 547 individuals we extracted PGx variants with drug-dosing recommendations of the Dutch Pharmacogenetics Working Group (DPWG). Furthermore, we explored the landscape of DPWG pharmacogenes in gnomAD and our in-house cohort as well as compared bioinformatic tools for WGS-based structural variant detection in CYP2D6. We show that although common/known PGx variants comprise the vast majority of detected DPWG pharmacogene alleles, for better precision medicine, PGx testing should move towards WGS-based approaches. Indeed, WGS-based PGx profiling is not only feasible and future-oriented but also the most comprehensive all-in-one approach without generating significant additional costs.


Introduction
Pharmacogenetics is primarily concerned with how genetic variation affects individual drug response [1]. In the current genomics era, technological advances allow the unprecedented implementation of pharmacogenetics and its importance is becoming increasingly evident. Less than 10% of drugs reach approval [2]; the costs of drug development, following Eroom's law, have risen to almost €3 bn per marketed drug [3]; healthcare costs are increasing; and the number of annual deaths and costs due to adverse drug events (ADEs) are estimated to be 197'000 and €79 bn, respectively, in the EU [4]. In the US, the annual cost caused by nonoptimized medical therapy is estimated to be approximately $530 bn [5]. In light of these facts, there is a need for improvement in both drug development and healthcare systems. Pharmacogenetics may provide a solution to tackle these issues in the form of pre-emptive pharmacogenetic (PGx) testing for drug selection and dosing as well as PGx-based stratification of clinical trials. While in oncology pharmacogenetics has already interpreted according to the quarterly-updated DPWG guidelines. Similar to the PREPARE trial [9,10], our expanded pipeline outputs individualized treatment recommendations on a PGx report (Figure 1b) as well as on a PGx profile (Figure 1c) in credit card format (medication safety card, MSC), both of which can accordingly be used for better patient care. Currently, our pipeline analyzes the most established 45 single nucleotide variants (SNVs) and insertions/deletions (indels) in 11 genes as well as an HLA-B*5701 tagging variant and the deletion/duplication of CYP2D6, providing DPWG recommendations for 77+X (X = oral contraceptives) drugs (hereafter referred to as DPWG variants). As additional pharmacogenetically-relevant variants are expected to be described in the future, the MSC contains a QR code, providing access to an appropriate portal (accessible online or via app), containing up-to-date recommendations for personalized drug selection and dosing. Thus, our expanded pipeline not only allows WGS-based diagnosis but also individualized PGx recommendations for drug selection and dosing.
Other less common DPWG variants, however, are not present in our primarily Caucasian in-house WGS cohort, likely due to the relatively small cohort size or because variants are predominantly detected in different subpopulations such as the CYP2C9*5 allele in individuals of African descent (Supplementary Table S1). In contrast, the variants CYP2C9*2 and CYP2C9*3 are significantly more frequent in the European subpopulations of gnomAD exomes v2.1.1 and gnomAD genomes v3 compared to the African subpopulation (Supplementary Table S1). Overall, however, the contributions are comparable among the three databases and with previous estimates [13,19].
Moreover, in our in-house cohort of 547 genomes, we screened for DPWG and LOF/missense variants occurring in the same gene. In our cohort, 19 individuals harbor such co-occurring variants, in all but one of which the additional LOF/missense variant may also cause decreased or no enzyme function. One individual harbors not only the allele CYP2C19*17 (c.-806C>T) causing the ultrarapid metabolizer phenotype, but also the allele CYP2C19*2 causing poor metabolizer phenotype as well as the hitherto undescribed CYP2C19 splicing variant c.643-2A>T likely disrupting gene function (cf. possibly LOF). Although phasing was not possible for these variants, not considering the novel splicing variant could result in a misclassification of the CYP2C19 metabolizer status. Furthermore, because many of our patients might be treated with an anticoagulant due to their cardiovascular phenotype, we screened for CYP2C9 and VKORC1 variants affecting phenprocoumon/warfarin dosing. Thereby, we detected 113 individuals homozygous for the high phenprocoumon/warfarin sensitivity variant VKORC1*2 as well as 11 and 2 individuals homozygous for the poor metabolizer variants CYP2C9*2 and CYP2C9*3, respectively. We also detected 5 individuals homozygous for VKORC1*2 and CYP2C9*2, requiring a reduction of the phenprocoumon/warfarin starting dose to~35% compared to VKORC1*1/CYP2C9*1 individuals [20].

Comparison of CYP2D6 Calling Tools from WGS Data
Three command-line-based bioinformatic tools, Astrolabe (previously Constellation) [15], Aldy [16], and Stargazer [17] were used to call CYP2D6 variants, including SVs. Using downloaded genetic reference data, we compared the CYP2D6 variant calls of these three tools to the GeT-RM 2019 consensus genotypes [21]. As shown in Table 2, of 21 samples Stargazer called 11, Astrolabe 12, and Aldy 19 correctly (note that for NA18524 Aldy detected all existing star allels but not in the right diplotype). The number of incorrect calls seems to be linked to samples with more than two star alleles. In the sample NA18519, Aldy and Stargazer detected the *106 star allele listed as *1 in the GeT-RM 2019 consensus genotype, which we confirmed as *106 by manual evalaution of the BAM file using the Integrative Genomics Viewer [22]. As Aldy reached the highest accuracy, we analyzed 547 WGS samples (primarily European descent) using this tool (Supplementary Figure S3). While the two most frequently detected CYP2D6 star alleles (*1, 351 alleles; *2, 187 alleles) encode a normally functioning enzyme, the most frequently detected alleles with decreased (*41, 93 alleles; *10, 25 alleles) or no function (*4, 123 alleles; *68+4, 71 alleles; *5, 30 alleles) together amount to approximately 30% of the CYP2D6 star alleles in our cohort.

Discussion
In this work, we outline our first-of-its-kind approach describing seamless integration of PGx profiling into our WGS-based diagnostic pipeline for rare Mendelian disorders, without generating significant additional costs. By analyzing our in-house WGS cohort as well as the largest publicly available population-based dataset gnomAD, we show that variants with DPWG recommendations comprise the vast majority of detected pharmacogene alleles and that individuals harbor at least one pharmacogenetically-actionable variant. Moreover, our results show that even the highly polymorphic, pseudogene-burdened pharmacogene CYP2D6 may be accurately genotyped using short-read WGS, indicating that it is suitable for both diagnostics and PGx profiling in a single assay. Several discussion points and conclusions emerge from our results.
First, once a diagnosis is provided, the next step for adequate disease management ideally is a targeted medical therapy, tailored to the individual's PGx predispositions, if available. In an optimal scenario, information regarding PGx predisposition is available prior to medical therapy. With the advent of NGS in the form of WES and WGS, unprecedented amounts of genetic data are being generated, and primarily WGS represents an often-untapped data source, as an individual's complete PGx profile may be assessed while generating minimal additional costs. Due to decreasing sequencing costs, WGS is not prohibitively expensive for routine application and 60× WGS represents the superior solution (compared to 30× WGS and WES, Supplementary Figure S2) not only for diagnostics but also for PGx testing [14,23]. We substantiate this notion by showing that several known intronic or promoter DPWG variants (CYP2C19*17, CYP3A5*3, UGT1A1*28/*37, VKORC1*2) are not covered in gnomAD exomes due to insufficient WES capturing (depending on the used WES capturing kit), and thus incomplete treatment recommendations may be provided based on WES. Using one of several existing NGS panels, including PGRNseq [7], AmpliSeq Pharmacogenomics Research Panel (thermofisher.com), or VeraCode ADME Core Panel (illumina.com), these variants indeed would be interrogated, but at the cost of requiring an additional test. Considering that every individual on average harbors six pharmacogenetically actionable variants, we identified several patients, who require significant PGx-based dosage adjustments, which has been shown to optimize treatment outcome for instance for the anticoagulants phenprocoumon or warfarin [24].
Second, the recent technological advances have caused a paradigm shift and, currently, interpretation of (pharmaco)genetic data represents the greater challenge than their generation. In order to streamline analysis and facilitate interpretation, most providers resort to genotyping of a number of variants, of which effects on phenotype are established and translated into clear clinical guidelines. The caveat of this approach is that only the small subset of variants in the genes are interrogated, potentially missing yet undescribed or rare variants, resulting in incorrect drug recommendations. Indeed, a rare/novel LOF variant occurring in cis with an ultrarapid-metabolizer variant would lead to misclassification as ultrarapid metabolizer and hence lead to incorrect PGx recommendations. To estimate such cases, in gnomAD and our in-house WGS cohort we assessed the relative allele frequencies of DPWG and LOF/missense variants highly likely to disrupt the function of DPWG pharmacogenes. Thereby, we showed that the 45 analyzed DPWG variants, when combined, indeed comprise~98% of detected pharmacogene alleles, confirming and expanding the results of two recent studies analyzing the much smaller datasets of the 1000 Genomes Project, Exome Sequencing Project, and ExAC [13,25]. As expected, the vast majority of novel LOF/missense variants occurred with a MAF <0.1%, or even with an allele count of 1. Thus, the screening of DPWG variants is of high clinical utility and cost efficient, may even be performed for individuals assessed in large-scale biobanks [19], but is ultimately limited in its scope. With increasing sequencing data available, future efforts are warranted to focus on sequencing-based approaches to enable better precision medicine, however, thereby generating novel challenges such as the interpretation of variants of unknown significance (VUS) in pharmacogenes. In the event of additional PGx variants being described, we may re-analyze our WGS data and subsequently integrate such variants into the patients' updated PGx profiles, which are accessible using the QR code on the MSC.
Third, short-read sequencing is inherently limited in repetitive and/or homologous genomic regions (i.e., mappability <1; the so-called "dead zone" of the genome) [14], hampering accurate variant calling, for example in CYP genes. The most prominent example is the CYP2D6 gene, responsible for bioactivation or elimination of~25% of prescribed drugs [26], and its adjacent pseudogenes CYP2D7 and CYP2D8P. Several of the 139 star alleles currently listed in PharmVar (pharmvar.org/gene/CYP2D6) are possible to detect, however, the number of duplications and gene rearrangements leading to CYP2D6-CYP2D7 hybrid genes are notoriously difficult to genotype using short-read-based NGS approaches. To resolve these complex genotypes from short-read NGS data, freely available software tools, such as used in this study and others [18], have been introduced. Notwithstanding the powerful algorithms behind the tested tools Astrolabe, Aldy, and Stargazer, none of them assigned the correct GeT-RM 2019 consensus genotypes (generated using a variety of orthogonal methods) to all of the tested samples. In general, the tools accurately called most of the samples with simple CYP2D6 genotypes and with only two star alleles. Discordantly called genotypes occurred in complex SVs, where Aldy reached the overall highest accuracy, not excluding the possibility that Aldy might have been trained using some of the GeT-RM samples and that the GeT-RM genotypes might be incomplete. Thus, primarily Aldy represents an important addition to the NGS software toolkit. Although incorporating bioinformatic CYP2D6 analysis from short-read NGS data is a timesaving alternative to wetlab approaches, further evaluation of such software tools (e.g., by using long-read sequencing) is warranted. Another key challenge of PGx implementation using microarrays or short-read sequencing into actionable recommendations is phasing of variants. Several phasing algorithms exist, such as Beagle (implemented in Stargazer) [27], FastPHASE [28], and SHAPE-IT [29], all of which rely on statistical inference to assign diplotypes. The most efficient and straight forward method for variant phasing, however, is long-read sequencing, using e.g., PacBio (pacb.com) or Oxford Nanopore Technologies (nanoporetech.com). It has been shown that targeted long-read sequencing of the CYP2D6 gene enables phasing and accurate calling of SNVs, indels, and SVs [30][31][32], which could be further evaluated using long-read WGS or hybrid assemblies of short and long reads.
Fourth, providing the PGx information as the MSC [10] or the recently developed "PGx-Passport" [33] in addition to a written report, if used correctly, allows platform-independent implementation of pharmacogenetics and patient autonomy. In theory, by integrating information on individual PGx profiles for decision support of drug selection and dosage, ADE, costs, and time can be reduced, the extent of which is currently being investigated in the PREPARE trial [9]. The Netherlands takes on a pioneering role in the implementation of pharmacogenetics, integrating PGx information in the Dutch G-Standard database, which is used by all parties in the healthcare system and provides the PGx recommendation text if a drug is prescribed or purchased [34].
Fifth, not only healthcare systems, but also the drug development industry may benefit from the implementation of PGx profiling. Currently, only 10% of all drugs that initiate phase I clinical trials are subsequently approved [2], among other reasons potentially due to neglected patient stratification such as according to metabolizer status in clinical trials. This caveat is exemplified by the angiotensin-II-type-1-receptor-antagonist losartan, which showed highly promising results in preclinical studies [35], which, however, could not be replicated in large-scale clinical trials for Marfan syndrome [36]. Because~14% of losartan is oxidized by CYP2C9 into its 10-40× more potent metabolite E-3174, it is assumed that the majority of the effect of losartan stems from E-3174 [37]. For individuals with the frequently occurring slow metabolizer alleles CYP2C9*2,*3,*5, therefore an increased dosage of losartan might be necessary to achieve a therapeutic effect, which ultimately might explain the discrepancies between preclinical and (non-pharmacogenetically-stratified) clinical trials with losartan. Indeed, clinical trials might benefit from increased stratification according to PGx status, similarly as stratification is more common in oncology trials according to tumor driver variants [6].
We acknowledge following limitations of our study: The main limitation of our study is that for gnomAD, individual-level information is (yet) unavailable and thus we could not screen for co-occurring PGx variants in the same individual. In addition, because the here analyzed sequencing data were generated using short-read NGS, information on variant phasing is unavailable and therefore, to infer wildtype (*1) alleles, we assumed that variants detected in the same gene occur in trans, likely leading to an underestimation of wildtype alleles. Finally, as we limited our analysis to DPWG variants and LOF/missense variants, we likely missed other, pharmacogenetically-relevant variants, which, however, are difficult to interpret with the current knowledge. Considering these limitations, our calculations should be regarded as estimates, aiming to show the frequency of variant alleles in large-scale datasets.

PGx Profiling from WGS Data as well as Comparison of WGS and WES for PGx Profiling
WGS (PCR-free, 60× 150PE) of 547 individuals with rare, mainly cardiovascular or connective tissue disorders was performed as previously described [23]. Resulting raw data in the form of FASTQ files were aligned, generating binary alignment map (BAM) files, and variant calling of SNVs and small indels, generating variant call format (VCF) files, was performed using GENALICE MAP v2.5.6 (Genalice, Nijkerk, The Netherlands) as previously described [38]. For 11 DPWG pharmacogenes (CYP2B6, CYP2C9, CYP2C19, CYP2D6, CYP3A5, DPYD, F5, SLCO1B1, TPMT, UGT1A1, VKORC1) and an HLA-B*5701 tagging variant, common/known variants (i.e., known to generate DPWG guidelines) as well as rare/novel but potentially relevant LOF variants (SNVs and indels) were extracted from our 547 in-house genomes using GENALICE MAP's gaVariant module, generating small gVCF-like VCF files displaying homozygous reference alleles as well (Supplementary Table S2). Note that other variant calling tools such as GATK [39] may also be used to extract pharmacogenetically-relevant variants and to generate small gVCF-like VCF files. To generate a personalized PGx report and MSC according to the DPWG guidelines, SNVs, indels, and possible SVs (such as affecting CYP2D6) were combined into a single VCF file and used as input for the Genetic Information Management System (GIMS) portal of bio.logis (bio.logis Genomic Healthcare GmbH, FFM, Germany).
Moreover, for the 45 DPWG variants, all the coding exons in the 11 current DPWG genes, and the core and extended ADME genes listed in pharmaadme.org, we compared the read-depth coverage of

Analysis of Star Alleles and Loss-of-Function Variants in gnomAD and in Our In-House WGS Cohort
The corresponding VCF files of gnomAD exomes v2.1.1 (125 748 exomes, released 2017, genome build GRCh37/hg19) and gnomAD genomes v3 (71 702 genomes, released 2019, genome build GRCh38/hg38) were downloaded (gnomAD.broadinstitute.com/downloads; note that as the genomes of gnomAD v2.1.1 are largely incorporated in the genomes of gnomAD v3, we considered only gnomAD v3 for genomes). In our cohort containing 547 WGS samples, joint variant calling was performed using the Population Calling module of GENALICE MAP [38] to simultaneously extract all sequence variants in 11 DPWG pharmacogenes and an HLA-B*5701 tagging variant, generating a multi-sample VCF file. Note that due to short-read-related alignment ambiguities, LOF and missense variants in HLA-B were not considered and the HCP5 variant rs2395029 (Chr6(GRCh37):g.31431780T>G) was used as marker linked to the HLA-B*5701 allele in populations with European ancestry [41].
For all 11 analyzed pharmacogenes, relative allele frequencies of DPWG and LOF variants were assessed and compared among gnomAD v2.1.1, v3, and our in-house WGS cohort. To infer the number of wildtype (*1) alleles, we subtracted the number of detected DPWG and LOF alleles from the total allele numbers in gnomAD v3, v2.1.1, and our in-house cohort under the assumption that variants detected in the same gene occur in trans. In addition, in our in-house WGS cohort, but not in gnomAD, we were able to assess the co-occurrences of DPWG, LOF and/or likely pathogenic missense variants in the same individual.

Statistical Analysis
The upper and lower limits of 95% confidence intervals of relative frequencies were calculated using the online tool VassarStats with a correction for continuity (vassarstats.net/prop1.html).

Conclusions
Taken together, we show that short-read WGS, rather than WES, is suitable for the profiling of pharmacogenes, including CYP2D6. WGS-based clinical sequencing may therefore be the most comprehensive all-in-one approach for the simultaneous testing of Mendelian diseases and profiling of pharmacogenes without generating significant additional costs. Moreover, we demonstrate that known DPWG variants comprise the majority of PGx variation. Hence, restricting PGx profiling to these variants streamlines the interpretation process and provides appropriate pharmacogenetically-guided treatment recommendations for the vast majority of individuals. For true precision medicine, i.e., for the best possible pharmacogenetically-guided treatment recommendations for each patient, however, it is warranted that comprehensive approaches, such as presented here, expand the targeted profiling of known DPWG variants to the genome-wide profiling of all pharmacogenes.