Technological Advances: CEBPA and FLT3 Internal Tandem Duplication Mutations Can be Reliably Detected by Next Generation Sequencing

Background: The detection of CEBPA and FLT3 mutations by next generation sequencing (NGS) is challenging due to high GC content and Internal Tandem Duplications (ITDs). Recent advances have been made to surmount these challenges. In this study, we compare three commercial kits and evaluate the performance of these more advanced hybrid-capture and AMP-chemistry based methods. Methods: Amplicon-based TSM 54-Gene Panel (Illumina) was evaluated against hybridization-capture SOPHiA Genetics MSP, OGT SureSeq, and AMP chemistry-based VariantPlex (Archer) for wet-lab workflow and data-analysis pipelines. Standard kit directions and commercial analysis pipelines were followed. Seven CEBPA and 10 FLT3-positive cases were identified that previously were missed on an amplicon NGS assay. The average reads, coverage uniformity, and the detection of CEBPA or FLT3 mutations were compared. Results: All three panels detected all 10 CEBPA mutations and all 10 FLT3 ITDs with 100% sensitivity. In addition, there was high concordance (100%) between all three panels detecting 47/47 confirmed variants in a set of core myeloid genes. Conclusions: The results show that the NGS assays are now able to reliably detect CEBPA mutations and FLT3 ITDs. These assays may allow foregoing additional orthogonal testing for CEBPA and FLT3.


Introduction
Myeloid malignancies predominantly affect the blood and bone marrow and are driven by genomic abnormalities. Detection of mutations in myeloid malignancies is important given their impact on disease diagnosis, prognosis, and/or therapy [1,2]. For example, exon 9 CALR mutations in can aid in the diagnosis of myeloproliferative neoplasm such as primary myelofibrosis. Furthermore, the type of CALR mutation is relevant since CALR type 1-like (c.1099_1150del) and CALR type 2-like (c.1154_1155insTTGTC) mutations have differential effects on prognosis [3].
There are more than 100 genes implicated in myelodysplastic syndrome and acute myeloid leukemia [4]. Next-generation sequencing (NGS) allows the testing for large amounts of mutations using massively parallel sequencing with a shorter turn around at lower cost than Sanger sequencing [5]. This makes it feasible to profile a patient's cancer to determine prognosis or define targeted therapy for optimization of patient outcomes.
However, the NGS technology bears its own complexities and challenges. Some highly important genes clinically are difficult to detect using NGS methods, namely CEBPA and FLT3. FLT3 mutations are harbored in 30% of AML and influence prognosis. Mutational impact can be further stratified based on allelic ratio, internal tandem duplication (ITD) size, karyotype, and co-mutations (e.g., NPM1). Overall, the presence of FLT3 mutations is associated with poor survival in AML patients [6,7]. The presence of FLT3 mutations also determines patient eligibility for therapy with tyrosine kinase inhibitors (eg. Midostaurin, Sorafenib Gilteritinib, etc.) [8]. With regards to CEBPA, the presence of biallelic mutations in AML marks a distinct disease entity with a better prognosis compared to patients with single-variant or wild-type CEBPA [9].The genetic profile in fact trumps the presence of morphologic dysplasia and these are still classified as AML with biallelic CEBPA mutations rather than AML with myelodysplasia-related changes in the setting of dyspoiesis [10].
However, the CEBPA gene contains GC rich regions, which are difficult to amplify and sequence [11,12]. FLT3 mutations typically involve internal tandem duplications (ITD) in the juxtamembrane domain or point mutations/deletions in the tyrosine kinase domain (TKD). ITD mutations are twice as common as TKD mutation and can be large making them challenging to sequence and alignment amplicons of various sizes [12,13]. Because of these challenges and the importance of detecting these mutations clinically, laboratory workflows may often employ duplicate testing of these genes on an orthogonal platform (e.g., Sanger or PCR) to avoid false negative results. This indeed was our practice at our high-volume academic center. However, with advancements in NGS and to increase efficiency in the lab, we explored commercially available solutions to surmount the limitations of our comprehensive genomic profiling (CGP) platform (TruSight Myeloid, Illumina) [14]. NGS kit manufacturers have developed different library preparation and analysis strategies to address challenges in difficult to sequence portions of the genome [15]. In this report, we report the limitations we experienced with the TruSight Myeloid Sequencing panel (Illumina, San Diego, CA USA), and evaluated the performance of enhanced CGP assays (optimized for FLT3 and CEBPA) from three different manufacturers: Myeloid Solution panel (SOPHiA Genetics, Saint Sulpice, Switzerland), SureSeq panel (Oxford Gene Technology, Begbroke, UK), and VariantPlex panel (ArcherDx, Boulder, CO, USA). We assessed wet-lab workflow as well as data analysis performance. The goal of this project was to clinically evaluate these panel to see if they could overcome the known limitations in CEBPA and FLT3 mutation detection that exist with an amplicon-based assay.

Clinical Specimens
Fifteen patient specimens submitted for amplicon based NGS myeloid panel were included in this comparative study. The specimens were selected because they tested positive on a gold-standard single-gene test (fragment analysis or Sanger sequencing) but were negative on an amplicon based NGS assay. Seven of these samples were CEBPA positive with a mix of point mutations and insertions/deletions. Ten of these samples were FLT3 positive cases with ITD up to 107 base pairs.

TruSight Myeloid (TSM) Panel
Library Preparation in TSM Panel employs an amplicon-based approach. The test begins with hybridization of DNA (200 ng measured using NanoDrop) with a multiplexed pool of oligonucleotide probes. The initial correlation between the Qubit and NanoDrop measurements was performed which indicated NanoDrop measured DNA concentrations higher than Qubit (~2.5x). However, NanoDrop quantification (200 ng) yields consistent library complexity (from over 5 years' worth of the TSM panel data.) Individual oligos in the pool contains a target-specific sequence and an adapter sequence that is used in subsequent PCR amplification. An extension-ligation reaction extends across the target region, followed by ligation to merge the two probes and generate a library of new templates with common ends. The extension-ligation templates are amplified using PCR, which incorporates two unique, library-specific indexes. PCR products are converted to singlestranded fragments and normalized to equimolar concentrations (4 nM

VariantPlex Myeloid Panel
Libraries were prepared using the VariantPlex protocol (ArcherDx Inc., Boulder, CO, USA) which utilizes Anchored Multiplex PCR (AMP) technology to generate targetenriched sequencing-ready libraries. The input DNA (200 ng measured using Qubit) is first enzymatically fragmented, the ends are blunted, A-tailed, phosphorylated and ligated with half-functional adapters. The adapters contain the universal primer binding sites, index for Illumina instruments and molecular barcodes for deduplication and error correction. The first PCR uses an anchored gene-specific primer 1 (GSP1) which amplifies against P5 primer in the adapter. The second enrichment amplification uses a different nested genespecific primer 2 (GSP2) to increase amplicon specificity and add read 2 primer binding site. The second primer is hybrid, which contains P7 primer and index 1 region for Illumina instrument. After this cycle, there are two indexes present in every enriched DNA molecule. The data processing was completed via the Archer Analysis platform (ArcherDx Inc.), and the process included FASTQ trimming, read deduplication, genome alignment, and variant detection and annotation. SNPs and small insertions/deletions (indels) of <25 bp are called using FreeBayes and Lofreq. To aid in detection of variants of interest, the ArcherDx variant caller Vision focused on detecting SNPs and small indels of interest by using a targeted VCF file.

Myeloid Solution Panel
The Myeloid Solution Panel by SOPHiA Genetics is based on hybridization-capture chemistry. At least 200 ng of pure DNA (measured using Qubit) is essential for optimal library preparation. DNA is first enzymatically fragmented and then end-repaired and A-tailed using Qiagen QIAseq FX kit. DNA is then ligated with adapter and dual indexes for sample multiplexing later in the process. The cleanup steps are performed to remove non-bound adaptors and size selected (~400 bp) using magnetic beads. A few rounds of PCR amplification are performed to enrich DNA fragments with adaptors. The libraries are cleaned using magnetic beads and quantified, and size verified using TapeStation and Qubit. The libraries are next pooled into a single reaction. Myeloid Solution xGen Lockdown Probes are used to capture the regions of interest. The probe-target duplexes are purified using streptavidin beads protocol. The post-capture amplification is performed to enrich the captured targets. The pooled libraries are again quantified, and size verified. The 1.8 pM of pooled libraries are then subjected pair-end sequencing on a MiSeq (Illumina, San Diego, CA, USA) with 301x2 cycles using Reagent Kit v2 600 cycles cartridge. Sequencing data is analyzed on SOPHiA Data Driven Medicine (DDM) platform. The DDM pipeline uses, PEPPER, proprietary SOPHiA technology, which allows the detection of the CALR 52 bp deletions and the FLT3 ITD up to 177 bp. PEPPER technology is based on a realignment algorithm.

SureSeq Panel
Genomic DNA (200 ng measured using Qubit) is enzymatically fragmented using double-stranded New England Biolabs (NEB) Fragmentase to generate fragments of appropriate size (distribution peak at between 150-250 bp). The fragmented dsDNA is repaired with ER enzyme mix to create blunt ends. Simultaneously a 3 adenine overhang is created for adaptor ligation. High fidelity PCR is used with a few PCR cycles to amplify the library before hybridization and target capture. The amplified library is denatured and captured by biotinylated probes. Then, the hybridized gene targets are bound to streptavidin beads and washed to remove possible off-target DNA. After the capture of targets, PCR is used to add indexes which will identify the sample of each sequence in the NGS run. The dsDNA PCR products then include both index sequences and adaptor sequences. The DNA libraries prepared need to be multiplexed such that each index-barcoded sample is present in the same amounts in the pooled sample. This is predicated on both accurate determination of peak size (bp), performed by TapeStation High-Sensitivity Kit, and accurate determination of library concentration (ng/µL), performed by Qubit High-Sensitivity assay. Data analysis is performed on SureSeq Interpreter software. The Interpreter software uses Qiagen Clinical Insight tool for SNVs and INDELs interpretation.

Results
The four NGS panels were compared for the genes and/or exons covered, library preparation workflows, depth, unformity and quality of coverage, variant allele fractions and ability to detect variants.

Panel Content Comparision
All panel have some coverage for the following genes relevant to myeloid malignancies: ASXL1, CALR, CEBPA, DNMT3A, ETV6, FLT3, IDH1, IDH2, JAK2, KIT, KRAS, MPL, NPM1, NRAS, RUNX1, TET2, TP53, U2AF1, WT1. The panels, however, differ in the target regions for these genes. Some genes are fully covered by all panels, while other genes have coverage only of certain exons (Table 1). To facilitate fair comparison between the panels, a few representative exons from the core myeloid gene list were selected based on kit manufacturer's claim about the region of interest (ROI) coverage. When multiple exons per gene had very similar coverage (consistent depth among multiple samples and uniform across the ROI) across all four panels), only one of those exons was selected for a comparison, as it would not add value to the comparison. all

Workflow Comparision
Each panel in this study uses a different library preparation approach for sequencing the genomic regions of interest. TruSight Myeloid was the only classic amplicon-based panel in this study. ArcherDx VariantPlex uses proprietary AMP chemistry which is similar to amplicon chemistry, but uses a nested PCR-like approach. SureSeq and Myeloid Solution are hybridization capture-based panels primarily distinguished by post-capture amplification in Myeloid Solution library preparation. The ease of use criterion was evaluated based on the number of steps the assay requires and the stage at which the libraries were pooled. SureSeq does not pool the libraries until the denaturation step before loading on the sequencer, which makes it labor intensive because of having to carry each individual library to the end. Overall steps required for TruSight Myeloid panel were the least compared to the other three panels. VariantPlex requires more steps than TruSight Myeloid because of the requirement of the second PCR, but less steps than Myeloid Solution, which requires hybridization, capture and post capture amplification steps. Each library pool was sequenced using either MiSeq or NextSeq sequencer. TruSight Myeloid and VariantPlex libraries were sequenced using 2 × 151 bp cycles on and completed sequencing in 27 h on NextSeq. SureSeq libraries were sequenced using 2 × 151 bp cycles on MiSeq and took 24 h for run completion. Myeloid Solution libraries were sequenced using 2 × 300 bp cycles and completed in 65 h on MiSeq. Workflow comparison is summarized in Table 2.

Depth of Coverage Comparision
Read depth or depth of coverage is the number of reads mapped to a single genomic position after alignment and removal of duplicate reads. The mean read depth is calculated as the total number of aligned bases to the target region divided by the target region size. It indicates how many reads, on average, are aligned at a reference base position. In general, the sensitivity and repeatability of an assay is associated with coverage depth. The read depth of core myeloid genes in each panel is presented on a logarithmic scale in the Figure 1. TruSight Myeloid panel achieved the highest average coverage (18,015), followed by Myeloid Solution (2290), VariantPlex (2217) and SureSeq (692). Comparisons of duplicate reads, on/off target reads, reads without inserts etc. were not within the scope of this project as the purpose of this study was to evaluate manufacturer validated pipelines and analysis filters.

Coverage Uniformity Comparision
Coverage uniformity implies equal distribution of reads along target regions. Uniform coverage reduces the amount of sequencing required to achieve a sufficient coverage depth in targeted regions. NGS assays never achieve full uniformity because some targets are under-sequenced while others are over-sequenced. There are also unavoidable off-target region sequencing. To facilitate fair comparison between the panels, we stipulated that if the exon coverage is 20% lower or 20% higher than the average coverage of the core myeloid genes for that panel, then coverage for that exon was considered to be nonuniform for that exon (Table 3). Among the representative exons selected for comparison, the highest number of uniformly covered exons were in Myeloid Solution (29/39) and SureSeq (28/29). VariantPlex has the least number of uniform exons (5/39) and TruSight Myeloid has 13/39 uniform exons. Coverage uniformity is also evident from Figure 1 where rounder circle with fewer spikes indicates more uniform coverage.

Variant Detection Comparison
Different variant filtering strategies, optimized for each panel by the kit manufacturers, were applied to VCF files as per respective bioinformatics pipelines. Variant allele fraction (VAF) cut-off of 5% was common among the panels. Only clinically relevant variants were chosen for comparison purposes. While the focus of this study was on CEBPA and FLT3, 47 clinically relevant variants from 15 samples were also included for accuracy comparison (Table 4). Seven CEBPA positive and ten FLT3 positive cases were identified using two criteria: 1. tested positive in single gene test, and 2. tested negative in an amplicon NGS assay. Three CEBPA cases had dual CEBPA mutations. Overall, in CEBPA positive cases, four had point mutations and six had indels. In ten FLT3 positive cases, the length of ITD was 21 to 107 bp. Sanger Sequencing and fragment analysis orthogonally detected CEBPA and FLT3 gene variants. Myeloid Solution, VariantPlex and SureSeq panels detected all 47 confirmed variants. TruSight Myeloid failed to detect 10 variants, nine of which lay in CEBPA or FLT3. There was an additional SRSF2 variant, p.P95_R102del, detected by all panels except TruSight Myeloid. While this was not orthogonally confirmed, manual review of the variant in IGV supported it to be a real variant.

BAM Tracks Comparison
In Figure 2, the BAM tracks from all four NGS targeted myeloid panels are loaded in IGV for comparison of the CEBPA coverage. TruSight Myeloid panel has the least coverage for CEBPA gene because it is an amplicon-based assay and the exon has a GC rich midsection. VariantPlex panel fully covers the CEBPA exon. However, the coverage is not homogenous because the assay takes a nested-PCR like approach. Myeloid Solution has drop in in coverage in the mid-section despite being a hybrid capture assay. However, in this area there is still coverage of about 600x and the peaks are at 5000x. SureSeq panel has the most homogenous coverage among all panels. However, the average coverage is only 700x. While the sequencing depth depends on the sequencing instruments and their capacity, we followed the manufacturer recommended samples per run and per flow cell. Downscaling of the coverage to the factor of the panel with lowest sequencing depth was avoided because it would add bias to the data as we would deviate from the manufacturer recommended sequencing protocols. Table 3. Coverage uniformity of core myeloid gene (E = exon).  In Figure 3, the BAM tracks from all four NGS targeted myeloid panels are loaded in IGV for comparison of the FLT3 coverage for exon 13-15. Myeloid Solution and SureSeq panels have uniform coverage for all three exons. However, the coverage of SureSeq (790x) is about three orders of magnitude lower than Myeloid Solution (2380x). The coverage of TruSight Myeloid panel appears to more uniform than of VariantPlex for the length of amplicons.

Discussion
Next generation sequencing technologies refer to a constellation of sequencing methods that share massively parallel sequencing, high throughput, and lower cost [16]. In the clinic, this has allowed for comprehensive genomic profiling to facilitate the timely detection of genetic variants with diagnostic, prognostic, or therapeutic import. In myeloid malignancies, namely AML, the detection of FLT3 and CEBPA alterations are crucial for that very reason [9]. Unfortunately, until recently, limitations in the ability of NGS to detect alterations in these genes required duplicate testing by more sensitive orthogonal method [17]. This prompted academic and private sector efforts in the field to surmount this challenge resulting in various commercially available solutions that claim to reliably detect mutations in this gene [18][19][20]. To increase efficiency in our laboratory workflow by eliminating duplicate testing and to verify vendor claims of accuracy, we performed a head-to-head comparison of TruSight Myeloid (Illumina, San Diego, CA, USA), VariantPlex (Archer), SureSeq (OGT), and Myeloid Solution (SOPHiA) panels in our CAP/CLIA-certified laboratory at a high-volume cancer center.
The hybridization-capture based panels (SureSeq and Myeloid Solution) and proprietary AMP chemistry based VariantPlex panel show promising results for detection of CEBPA and FLT3 variants, all demonstrating 100% sensitivity owing to their unique chemistries and bioinformatics approaches which provided them an advantage over amplicon based TruSight Myeloid panel that detected only 8 of 17 FLT3 and CEBPA variants. All three panels also showed high concordance (100%) detecting 47/47 confirmed variants. This is significant given that detractors of personalized medicine have cited the lack of NGS reproducibility as an argument [21,22]. In this study, we show this not be the case and that reliable NGS results can be procured across different platforms and sequencers based on the current state of technology.
There were differences in coverage metrics between panels, but this did not prevent them from accurately calling the confirmed mutations. Overall, TruSight Myeloid had the deepest coverage but lack of uniformity lends itself to wasted sequencing. The highest uniformity for covered exons were found in the Myeloid Solution panel. All orthogonally confirmed mutations were detected using the three panels being evaluated against TruSight myeloid panel. Analytical sensitivity and specificity and precision studies were not within the scope of this study. We hope to address these details in a forthcoming publication detailing extensive validation work done to assess precision and accuracy of a custom 98-gene panel based on encouraging data from this study. In conclusion, current NGS technologies appear to provide reliable and accurate detection of CEBPA and FLT3 variants surmounting historical challenges with NGS.

Conflicts of Interest:
The authors declare no conflict of interest.