Comparison of Eight Technologies to Determine Genotype at the UGT1A1 (TA)n Repeat Polymorphism: Potential Clinical Consequences of Genotyping Errors?

To ensure accuracy of UGT1A1 (TA)n (rs3064744) genotyping for use in pharmacogenomics-based irinotecan dosing, we tested the concordance of several commonly used genotyping technologies. Heuristic genotype groupings and principal component analysis demonstrated concordance for Illumina sequencing, fragment analysis, and fluorescent PCR. However, Illumina sequencing and fragment analysis returned a range of fragment sizes, likely arising due to PCR “slippage”. Direct sequencing was accurate, but this method led to ambiguous electrophoregrams, hampering interpretation of heterozygotes. Gel sizing, pyrosequencing, and array-based technologies were less concordant. Pharmacoscan genotyping was concordant, but it does not ascertain (TA)8 genotypes that are common in African populations. Method-based genotyping differences were also observed in the publication record (p < 0.0046), although fragment analysis and direct sequencing were concordant (p = 0.11). Genotyping errors can have significant consequences in a clinical setting. At the present time, we recommend that all genotyping for this allele be conducted with fluorescent PCR (fPCR).


Introduction
The uridine diphosphate glucuronosyltransferase 1A1 (UGT1A1) gene is involved in glucuronidation of a wide variety of substances, including drugs, endobiotics, dietary, and environmental compounds [1]. Glucuronidation both inactivates [2] and facilitates clearance of UGT1A1 substrates [3]. Genetic variation in UGT1A1 is associated with wide variability in UGT1A1 enzymatic activity that affects the pharmacokinetics and activity of over 50 therapeutics, including anticancer agents, antiretrovirals, NSAIDs, corticosteroids, anti-inflammatory agents, and many others [1]. Several agents now mention UGT1A1 genotyping in their package insert and several hundred published studies are devoted to UGT1A1-related pharmacogenomics associations [1, [4][5][6][7][8][9][10][11]. As we continue to make progress with implementation of clinical pharmacogenomics at the NIH Clinical Center, UGT1A1 genotype accuracy is an important issue to address [12].
During assay development for two genotype-directed clinical trials involving a liposomal formulation of irinotecan (MM-398; NCT03221400) and an SN-38 conjugated HSP90 inhibitor (PEN-866; NCT02631733), we noted several instances in which different genotyping technologies reported different results for rs3064744 in the same control samples. We therefore undertook a process to determine which genotyping technologies were concordant: direct sequencing, pyrosequencing, gel sizing, DMET Plus arrays, Pharmacoscan arrays, Illumina (MiSeq), fragment analysis, and fluorescent PCR (fPCR).

Genotype Concordance between Illumina Sequencing, Fragment Analysis, and Fluorescent PCR
Shorter PCR fragments were preferentially amplified in both Illumina sequencing and fragment analysis (Figure 1). For example, Illumina sequencing of a known (TA) 6 homozygote revealed a range of fragment sizes that are attributed to PCR "slippage": 2.5% (TA) 4 , 18.1% (TA) 5 , 57.5% (TA) 6 , 11.7% (TA) 7 , and 0.3% (TA) 8 . However, beause distributions of fragment size percentages had non-overlapping ranges, we developed an algorithm that utilizes raw percentages of fragment sizes (Table S2) to assign genotypes to Illumina MiSeq data and fragment analysis. Fragment analysis was also conducted using commercially available controls (Table S3). Heuristic groupings were confirmed using principal component analysis, although principal components did not distinguish well between (TA) 5 /(TA) 6 and (TA) 5 /(TA) 7 genotypes in Illumina data due to the small number of carriers. ( Figure S1). Controls (for UGT1A1*36/*1, UGT1A1*1/*1, UGT1A1*1/*28, UGT1A1*28/*28) were repeated in multiple assays for fragment analysis; hence, there are differences in experimental replicates.   The consistency between percentages of fragment sizes between the two genotyping methods were remarkable and suggest that the polymerase used in both assays was similarly error prone. Such analysis led to complete concordance between the two genotyping methods (Table 3). We sent samples representing each genotype to Quest Diagnostics for determination of genotype using fluorescent PCR (UGT1A1*1/*36 n = 2, UGT1A1*36/*28 n = 2, UGT1A1*1/*1 n = 2, UGT1A1*1/*28 n = 3, UGT1A1*1/*37 n = 1, UGT1A1*28/*28 n = 2, UGT1A1*28/*37 n = 1). These results were also completely concordant with Illumina and fragment analysis data. We concluded that these methods were accurate on the basis of three lines of evidence: (1) the former three genotyping methods were completely concordant, (2) genotypes were concordant with controls, and (3) Illumina and fragment analysis demonstrated similar percentages of fragment size distributions that were unique enough to bin into different genotype categories.

Genotype Concordance in Direct Sequencing, Gel Sizing, Pyrosequencing, DMET Plus, and Pharmacoscan
As was previously observed [19], direct sequencing (in both directions) led to ambiguous genotyping in most heterozygotes as only nominal differences were apparent in electropherograms from heterozygotes with a common allele. Electropherograms from UGT1A1*1/*28 carriers were only nominally different from those of UGT1A1*1/*37 carriers, and the same was true for UGT1A1*36/*1 and UGT1A1*36/*28 carriers ( Figure 2). In our hands, such minor differences resulted in significant genotyping uncertainty for rarely observed heterozygotes until sufficient genotype diversity allowed for comparison between a sufficient variety of genotypes. For several individuals who were homozygous on the basis of other sequencing methods (see Illumina and fragment analysis), genotypes were also ambiguous due to high background following the (TA) n repeat. We attribute both problems to artifacts generated during PCR amplification.

Phenotype Concordance
In spite of the non-concordance between genotype, all assays were phenotypically concordant with two exceptions: (1) the DMET Plus array misclassified nine IMs (UGT1A1*1/*28 and UGT1A1*36/*28) as EMs, and (2) the Pharmacosan array misclassified an IM (UGT1A1*1/*37) as an EM. As the pyrosequencing assay often led to unclear results, UGT1A1 phenotype would be uncalled in seven cases using this technology. Direct sequencing and Pyromark gels would be likely to result in incorrect phenotypes in a multitude of cases. No platform for which we received raw data provided clear genotypes without significant post-processing, which could impede phenotype assessment in many cases.

Comparisons with Previously Published Peer-Reviewed Data
We next searched the literature for publications ascertaining UGT1A1 rs3064744 in healthy American and European Caucasians (n = 11,145) and healthy individuals with African descent (n = 1707). We identified 136 publications appearing between 1996 and 2018 that genotyped this allele in populations who did not have disease or conditions directly related to UGT1A1 function (e.g., hyperbilirubinemia). A total of 71 of these publications contained data comparable with the present study and prior publications utilizing direct sequencing (26 studies), fragment analysis (52 studies), gel sizing (17 studies), or pyrosequencing (14 studies; Table S4). The genotype and allele frequencies differed substantially between the different methods applied to genotyping Caucasians (P < 0.0001). This difference was primarily attributable to gel sizing, which has a lower frequency of (TA) 6 /(TA) 6 genotypes (43.0%) and a higher frequency of (TA) 6 /(TA) 7 (47.7%) than others (%(TA) 6 /(TA) 6 = 44.3, 43.8, 46.1; and %(TA) 6 /(TA) 7 = 44.3, 43.9, 45.9 for direct sequencing, fragment analysis, and pyrosequencing, respectively). When gel sizing was excluded, the three others also differed (P = 0.0046). This difference was attributable to pyrosequencing, which has a higher frequency of (TA) 6 /(TA) 7 and a lower frequency of (TA) 7 /(TA) 7 than others (%(TA) 7 /(TA) 7 = 11.5, 12.3, 9.3, and 8.0 direct sequencing, fragment analysis, gel sizing, and pyrosequencing, respectively). Data obtained from direct sequencing and fragment analysis did not differ in either Caucasians (P = 0.11) or individuals with African descent (P = 0.92).

Discussion
The present study was undertaken to address concerns about genotype accuracy in two genotype-directed prospective clinical trials utilizing a liposomal irinotecan formulation (MM-398) and an HSP90 inhibitor conjugated to SN-38 (PEN-866; NCT03221400 and NCT02631733, respectively), for which we chose to use fPCR by Quest Diagnostics. Although we demonstrated that fPCR (Table S1) is an accurate genotyping method, our findings suggest that patients who undergo UGT1A1 genotyping for dosing or therapeutic choice may be underserved by many other current genotyping technologies, regardless of the CLIA certification of the Illumina sequencing, fragment analysis, DMET analysis, and pyrosequencing used herein. Genotyping errors have the potential to persist in the medical record, as the germline does not change throughout an individual's lifetime. Such errors can expose patients to a variety of iatrogenic hazards [15,21]. Thus, as pharmacogenomic research continues to progress and new UGT1A1-related gene-drug interactions are identified and characterized, selection of the appropriate genotyping test for UGT1A1 rs3064744 is critical [1].
We showed that only fPCR provides unambiguous results that require no post-processing of data; albeit, the Nichols Institute, which conducted the test, does not disclose methods for this assay [22]. Although Illumina and fragment analysis were shown to be accurate, our results suggest that even these technologies can be easily misinterpreted due to amplification of multiple smaller fragment sizes that are not representative of a patient's genotype. The present results also demonstrate that several technologies are inappropriate for use in obtaining specific genotypes at rs3064744: pyrosequencing, Pyromark gels, DMET Plus, and Pharmacoscan. Pyrosequencing provides many ambiguous calls due to the presence of additional peaks that convolute interpretation of pyrograms. Pyromark gel and bioanalyzer genotyping frequently demonstrate peaks that could fall into several genotype categories, and we could not confidently call genotype using this technology. Analysis of previously published studies also demonstrates that pyrosequencing and gel sizing results differ from those of other methods. Use of the DMET Plus array resulted in nine miscalls that all suggested IMs were EMs. Although Pharmacoscan has improved the specificity and accuracy of genotyping, it does not detect UGT1A1*37 alleles, and a patient who was UGT1A1*1/*37 (an IM) was called UGT1A1*1/*1 (an NM). Thus, even if one ignores specific genotypes and only classifies patients as NM, IM, or PM, then DMET Plus and Pharmacoscan could each introduce genotyping errors into the medical record. Because African Americans carry a higher frequency of (TA) 5 and (TA) 8 [23], this population is at particular risk of genotyping miscalls. As racial admixture becomes more prevalent in the United States [24], such genotyping errors are of particular concern.
Our results also extend to other clinical facets, including dosing and therapeutic selection of traditional irinotecan formulations. The Dutch Pharmacogenetics Working Group (DPWG) recommendations advise a 70% starting dose reduction for UGT1A1*28/*28 carriers receiving irinotecan, and further dosing is based on neutrophil count [25]. The French National Network of Pharmacogenetics (RNPGx) advises a 25%-30% dose reduction in UGT1A1*28/*28 carrier receiving 180-230 mg/m 2 spaced by 2-3-week intervals and administering UGT1A1*1/*28 or UGT1A1*28/*28 carriers less than 240 mg/m 2 [26]. Recommendations have not been established for other genotypes because data in UGT1A1*36 and UGT1A1*37 carriers is sparse. As doses are increased on the basis of neutrophil count, risk is focused in patients who experience excessive toxicity due to a high starting dose. Thus, genotyping errors are of particular concern for UGT1A1*28/*28 carriers receiving 180-230 mg/m 2 and UGT1A1*1/*28 or UGT1A1*28/*28 carriers receiving more than 240 mg/m 2 of irinotecan. DMET genotyping would have incorrectly reported that nine intermediate metabolizers were extensive metabolizers; thus, these patients could have been subjected to irinotecan doses greater than 240 mg/m 2 that could have been dangerous. Other genotyping platforms, except fPCR, could have led to non-calls and/or incorrect calls, as these platforms are often ambiguous or incorrect.
Atazanavir prescribing is also based on UGT1A1 metabolism status: NM, IM, and PM. Patients with PM status are at risk of developing jaundice that will result in atazanavir discontinuation (approximately 20%-60% of carriers), and alternate agents should be considered [15]. One patient in our cohort was called *1/*1 (NM) by Pharmacoscan, but was actually *1/*37 (IM) by other genotyping methods and ambiguous via pyrosequencing. Such a patient would not necessarily be at risk under the current dosing guidelines, but a patient carrying *37/*37 (PM) most likely would have been called *1/*1 and would have been treated improperly. Nevertheless, should a patient be assigned an incorrect genotype prior to atazanavir therapy, such an error would propagate in the medical record, potentially exposing such a patient to improper therapy in the future.
As the rs3064744 locus appears to have significant phenotypic consequences on a wide variety of agents, genotyping errors at this site are of particular concern as pharmacogenomics testing continues to discover new interactions with this variant. At best, ambiguity in genotype testing would lead to unnecessary delays in therapy when urgent treatment is required [12]. At worst, incorrect genotypes could harm patients and be propagated in the medical record, potentially resulting in greater complications. Finally, the present lack of data on irinotecan outcome in UGT1A1*36 and UGT1A1*37 carriers is also likely a function of genotyping errors in the literature. At the present time, we recommend that all genotyping for this allele be conducted with fPCR. Technological innovation to overcome genotype miscalls at this site is urgently needed for both clinical and scientific purposes.

Patients and Samples
The patient cohort was derived from a prospective pharmacogenomics trial [27]. Briefly, 546 patients with histological diagnosis of primary lung carcinoma were enrolled between 2009 and 2012 (NCT#00923884). The study included individuals with a histological diagnosis of non-small cell (stage I-IV) or small cell lung cancer (limited or extensive stage) who received any treatment (surgical resection, chemotherapy, radiation, or molecularly targeted therapy), had any ECOG score (0-3), and had normal or impaired organ function. Patients were not precluded from enrolling if they had a history of diagnosis with other cancers. Of these, 163 patients who received paclitaxel were genotyped for a prior publication, and no associations between UGT1A1 genotypes and clinical outcomes or patient or disease parameters were detected [27]. Because the present study was concerned with genotyping methods, clinical outcomes were not considered relevant. The study was approved by the Institutional Review Board at the National Cancer Institute (Bethesda) and Veteran's Affairs Medical Center (Washington D.C.) (protocol 09C0103, approved 5 March 2009), and all patients provided informed consent. Genomic DNA was extracted from blood samples using the QIAamp DNA Blood kit (Qiagen, Germantown, MD, USA). CLIA-certified fPCR was conducted by the Nichols Institute (Quest Diagnostics Inc, San Juan Capistrano, CA, USA), and Pharmacoscan was conducted by RUCDR Infinite Biologics (Nelson Biological Laboratories, Piscataway, NJ, USA).

Illumina Sequencing (MiSeq)
Primers were designed for the region of interest within the promoter of UGT1A1, specifically dbSNP ID: rs3064744.
For the MiSeq amplicon design, the FW primer sequence was 5 -TTTATCTCTGAAAGTGAACTC-3 and the RV was 5 -TGGGCGTCCGCCCTGGGACTC-3 . These primers were adapted with M13FW and M13RV tags: 5 -gtaaaacgacggccagt-3 (FW strand) and 5 -ggaaacagctatgaccatg-3 (RV strand). All primers were purchased from Thermo Scientific. For the PCR, Invitrogen's High-Fidelity Taq System (Thermo Fisher Scientific, Waltham, MA, USA) and 10 mM dNTPs (Thermo Fisher Scientific) with 5% molecular grade dimethyl sulfoxide (Sigma-Aldrich, St. Louis, MO, USA) were utilized. The M13-labeled primers were used to generate the target amplicons for the libraries from DNA with an Applied Biosystems Veriti 96-well thermal cycler (Thermo Fisher Scientific) using the following conditions: 95 • C for 5 min; followed by 20 cycles of 94 • C for 1 min, 58 • C for 1 min, 72 • C for 1 min; followed by 20 more cycles of 94 • C for 1 min, 65 • C for 1 min, 72 • C for 1 min, and completed with a final extension of 72 • C for 10 min, then holding at 4 • C. The resulting target specific amplicons were used directly in the adapter PCR that follows.
For this step, barcoded primers for bi-directional coverage of each target amplicon were designed per Illumina indexing protocols (www.Illumina.com) using Illumina P5 and P7 adapter-indexes, barcodes, and M13 adapters. These adapters were added to the amplicons to create the library samples with the Veriti thermal cycler and the following conditions: 95 • C for 2 min; followed by 15 cycles of 94 • C for 30 s, 55 • C for 30 s, 72 • C for 1 min, and completed with a final extension of 72 • C for 1 min, then holding at 4 • C. The PCR system was Invitrogen's Platinum Taq along and dNTPs.
The The libraries were then analyzed for quality and quantity using Agilent's 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA) and Agilent's DNA 1000 kit. Using Bioanalyzer software version 2100 Expert B.02.08 SI648, region tables were assigned to the libraries, giving approximate sizes and concentrations. Any libraries that failed to amplify well were repeated from the initial PCR step and not included in the final pool until optimal. Each library sample was also quality checked using Thermo Fisher Scientific's nanodrop ND-8000 spectrophotometer. If required, library samples were normalized on the basis of the lowest concentration from the Bioanalyzer data, and then all libraries were pooled together to that equimolar concentration. The final library pool was then checked one last time for concentration with the ND-8000.
Using the approximate base pair size and the final nanodrop concentration, the correct load concentration was determined for the library. The library was then processed according to Illumina's MiSeq System Denature and Dilute Libraries Guide. The control used was PhiX Control kit v.3, (Illumina, San Diego, CA, USA).
Sample barcode demultiplexed FASTQ files from the MiSeq sequencing were evaluated for the presence of variants of the reference sequence TTTTTGCCATATATATATATATAGTAGGAGAGGGCGAACC. Variants were enumerated by collecting, for each sample, all sequence reads that were bounded by TTTTTGCCA and AGTAGGAGAGGGCGAACC. All variants for each sample were tabulated and sorted according to length and nucleotide sequence. Variant frequency was calculated as the ratio of the number of identical reads for each sample divided by the total number of qualifying reads in the sample.

Fragment Analysis
Primers were designed for the region of interest within the promoter of UGT1A1, specifically dbSNP ID: rs3064744. This design was a nested PCR, and the outer PCR primers were as follows: FW primer 5 -TTCTTCCTCTCTGGTAACACTT-3 , RV primer 5 -ACTCTTTCACATCCTCCCTT-3 . For the PCR assay, Invitrogen's High Fidelity Taq System (Thermo Fisher Scientific) and Invitrogen 10 mM dNTPs (Thermo Fisher Scientific) with 5% molecular grade dimethyl sulfoxide (DMSO) (Sigma-Aldrich) were utilized. Samples were amplified with Applied Biosystems Veriti 96-well thermal cycler (Thermo Fisher Scientific) using the following conditions: 95 • C for 5 min; followed by 20 cycles of 94 • C for 1 min, 58 • C for 1 min, 72 • C for 1 min. This was followed by 20 more cycles of 94 • C for 1 min, 65 • C for 1 min, 72 • C for 1 min, with a final extension of 72 • C for 10 min, then holding at 4 • C.
The generated PCR amplicons were then purified using exonuclease I (GE Healthcare, Pittsburgh, PA, USA) and shrimp alkaline phosphatase (Affymetrix), in accordance with the Exo-Sap protocol. The Exo-Sap-sample mixture was then incubated in the Veriti thermal cycler: 37 • C for 15 min, then 80 • C for 15 min, followed by a 4 • C hold. This purified amplicon was then used in the next, inner PCR as described below.
For the inner PCR, the FW PCR primers were 5 -GCTCCACCTTCTTTATCTCTG-3 , 5 -FAM-GCTCCACCTTCTTTATCTCTG-3 , and 5 -GTTTCTGCTCCACCTTCTTTATCTCTG-3 (pigtailed FW). The RV PCR primers were 5 -ATCAACAGTATCTTCCCAGC-3 , 5 -FAM-ATCAACAGTATCTTCCCAGC-3 , and 5 -GTTTCTATCAACAGTATCTTCCCAGC-3 (pigtailed RV). Before amplification, primer mixes were created. For the FW FAM-labeled product, a 20 µM FAM-FW primer mix was created by adding 480 µL of molecular grade water, 18 µL of 500 µM RV, and 2 µL of 500 µM FAM-labeled FW primer. The amplification was then carried out with this FAM-FW/RV primer master mix and the pigtailed RV, each at 0.8 µM final concentrations. For the RV FAM-labeled product, a 20 µM FAM-RV primer mix was created by adding 480 µl of molecular grade water, 18 µL of 500 µM FW, and 2 µL of 500 µM FAM-labeled RV primer. The amplification was then carried out with this FAM-RV/FW primer master mix and the pigtailed FW, each at 0.8 µM final concentrations. All samples were tested with both the FAM FW primer master mix and the FAM RV primer master mix for comparison and confirmation. Samples from DNA were amplified for the inner PCR using Platinum Taq and dNTPs, along with the Veriti thermal cycler using the following conditions: 94 • C for 5 min, then 20 cycles of 94 • C for 30 s, 58 • C for 30 s, 72 • C for 30 s; followed by a final extension of 72 • C for 7 min, then holding at 4 • C.
The resulting products were then checked for quality and concentration with Agilent's 2100 Bioanalyzer (Agilent) and Agilent's DNA 1000 kit, using Bioanalyzer software version 2100 Expert B.02.08 SI648. These samples were then diluted with molecular grade water at a 1:10 ratio to prepare them for running on fragment analysis. A master mix was created using Applied Biosystems Hi Di Formamide (Thermo Fisher Scientific) and Applied Biosystems Gene Scan LIZ 500 (Thermo Fisher Scientific). All incubations were carried out with the Veriti thermal cycler. Samples were then processed on Applied Biosystems 3730xl DNA Analyzer (Thermo Fisher Scientific), 96 capillary 50 cm array, using data generated with DS-33 Matrix Standard Kit (Dye Set 5) (Thermo Fisher Scientific). Fragment analysis data was reviewed with Thermo Fisher Scientific Peak Scanner 1.0 software.

Direct Sequencing (in-House)
Primer pairs were designed on the basis of the gene sequence available at Genbank, and the sequences of each primer pair is listed as follows: 5 -AAGCGGGGGTACAGTTGTGTTC-3 , 5 -AAGAATACAGTGGGCAGAGACAG-3 . PCR reactions were carried out in a 50 µL reaction mixture containing 200 ng of genomic DNA, 1X PCR buffer (Thermo Fisher Scientific, Waltham, MA, USA), 1.5 mmol/L MgCl 2 , 0.2 mM deoxynucleotide triphosphates, 800nM of each primer (i.e., F1 and R1), and 1.25 U Platinum Taq DNA polymerase (Invitrogen) using a GeneAmp PCR system 9700 (Thermo Fisher Scientific, Waltham, MA, USA) as a thermocycler with the following thermal profile (primary PCR): 40 cycles of denaturation at 94 • C for 30 s, annealing at 66 • C for 30 s, and extension at 72 • C for 30 s. After amplification, the quality of the amplified PCR products was verified by agarose gel electrophoresis. The PCR products were then sequenced on an ABI Prism 3130xl Genetic Analyzer (Applied Biosystems) per the manufacturer's instructions using the following sequencing primers: 5 -TCCTTCTTCCTCTCTGGTAAC-3 , 5 -ACATTATGCCCGAGACTAAC-3 .

Pyrosequencing
Primers were designed for the region of interest within the promoter of UGT1A1, specifically dbSNP ID: rs3064744, and ordered from Invitrogen.
The Pyromark sequence primer was 5 -