Validation of Ion TorrentTM Inherited Disease Panel with the PGMTM Sequencing Platform for Rapid and Comprehensive Mutation Detection

Quick and accurate molecular testing is necessary for the better management of many inherited diseases. Recent technological advances in various next generation sequencing (NGS) platforms, such as target panel-based sequencing, has enabled comprehensive, quick, and precise interrogation of many genetic variations. As a result, these technologies have become a valuable tool for gene discovery and for clinical diagnostics. The AmpliSeq Inherited Disease Panel (IDP) consists of 328 genes underlying more than 700 inherited diseases. Here, we aimed to assess the performance of the IDP as a sensitive and rapid comprehensive gene panel testing. A total of 88 patients with inherited diseases and causal mutations that were previously identified by Sanger sequencing were randomly selected for assessing the performance of the IDP. The IDP successfully detected 93.1% of the mutations in our validation cohort, achieving high overall gene coverage (98%). The sensitivity for detecting single nucleotide variants (SNVs) and short Indels was 97.3% and 69.2%, respectively. IDP, when coupled with Ion Torrent Personal Genome Machine (PGM), delivers comprehensive and rapid sequencing for genes that are responsible for various inherited diseases. Our validation results suggest the suitability of this panel for use as a first-line screening test after applying the necessary clinical validation.


Introduction
The morbidity, mortality, and disability that are associated with inherited diseases can be greatly reduced or prevented through improving the accuracy and speed of molecular testing. Genetic advances over the past decades led to the development of different mutation screening techniques,

Samples
We randomly selected samples from 88 Saudi patients with previously confirmed genetic diagnosis after obtaining full informed consent from all of the participants. Written informed consents were obtained from all of the study subjects, which is in adherence with the declaration of Helsinki, and according to King Faisal Specialist Hospital & Research Centre (KFSHRC) Institutional Review Board (IRB) and Research Advisory Committee (RAC) rules and regulations under the following approved projects: (RAC#2020011, approved-present), (RAC#2050022, approved-present) (RAC#2100001, approved-present). The conditions and corresponding genes that were tested in this study are listed in Table 1. A breakdown of our validation cohort by disease category and the number of samples tested for each condition is summarized in Figure 1. DNA was extracted from whole blood using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany). The quantity of extracted DNA was estimated using the broad range Qubit ® 2.0 kit (Invitrogen, Carlsbad, CA, USA), according to manufacturer's instructions.

Library Building and Sequencing
Ten nanograms of DNA were used in three primer pools, in conjunction with Ion AmpliSeq Library Kit 2.0 (Thermo Fisher, Carlsbad, CA, USA) for 14 cycles. Pooled PCR amplicons were first digested using FuPa reagent (Thermo Fisher), and then ligated with universal adaptors. A maximum of two samples were pooled for emulsion PCR (ePCR) using the Ion OneTouch™ 200 Template Kit (Thermo Fisher). The ePCR template Ion Sphere particles were enriched using the Ion OneTouch ES (Thermo Fisher), following the manufacturer's instructions. To assess the enrichment status, we used Ion Sphere Quality Control Kit (Thermo Fisher), and the products were measured using Qubit ® 2.0 Fluorometer (Thermo Fisher) following manufacturer's instructions. Template positive Ion Sphere particles were sequenced using the Ion PGM™ 200 Sequencing Kit (Thermo Fisher) on the Ion Personal Genome Machine (PGM) and 318 TM semiconductor chip.

Data Analysis
The bed files containing targeted regions corresponding to the IDP panel were generated based on human genome 19 (hg19) build and used for sequence reads alignment and analysis. Resulting aligned reads (BAM files) were visualized and interrogated with Integrative Genomics Viewer software (http://www.broadinstitute.org/igv). To generate good quality reads, adaptor sequences were trimmed and low quality reads were excluded before using Torrent Suite v4.0 with Variant Calling plugin for the initial analysis (Thermo Fisher, https://github.com/iontorrent/TS). The obtained variants were then annotated using public and in-house databases, as previously described [22]. The Variant calling plugin was set to include variants with a minimum coverage of 20×, following the software developers' recommendations. On average, each variant calling file (VCF) contained around 800-1200 variants/sample, which was reduced after initial filtration to an average of 20 variants/sample.

Library Building and Sequencing
Ten nanograms of DNA were used in three primer pools, in conjunction with Ion AmpliSeq Library Kit 2.0 (Thermo Fisher, Carlsbad, CA, USA) for 14 cycles. Pooled PCR amplicons were first digested using FuPa reagent (Thermo Fisher), and then ligated with universal adaptors. A maximum of two samples were pooled for emulsion PCR (ePCR) using the Ion OneTouch™ 200 Template Kit (Thermo Fisher). The ePCR template Ion Sphere particles were enriched using the Ion OneTouch ES (Thermo Fisher), following the manufacturer's instructions. To assess the enrichment status, we used Ion Sphere Quality Control Kit (Thermo Fisher), and the products were measured using Qubit ® 2.0 Fluorometer (Thermo Fisher) following manufacturer's instructions. Template positive Ion Sphere particles were sequenced using the Ion PGM™ 200 Sequencing Kit (Thermo Fisher) on the Ion Personal Genome Machine (PGM) and 318 TM semiconductor chip.

Data Analysis
The bed files containing targeted regions corresponding to the IDP panel were generated based on human genome 19 (hg19) build and used for sequence reads alignment and analysis. Resulting aligned reads (BAM files) were visualized and interrogated with Integrative Genomics Viewer software (http: //www.broadinstitute.org/igv). To generate good quality reads, adaptor sequences were trimmed and low quality reads were excluded before using Torrent Suite v4.0 with Variant Calling plugin for the initial analysis (Thermo Fisher, https://github.com/iontorrent/TS). The obtained variants were then annotated using public and in-house databases, as previously described [22]. The Variant calling plugin was set to include variants with a minimum coverage of 20×, following the software developers' recommendations. On average, each variant calling file (VCF) contained around 800-1200 variants/sample, which was reduced after initial filtration to an average of 20 variants/sample.

Tertiary Analysis and Variant Validation
Tertiary analysis was carried out, as previously described [22,23], after applying more stringent quality filtering criteria of a minimum coverage of 50× and a quality score of at least 350 for single nucleotide variants (SNVs) and 700 for Indels. Briefly, after passing the initial quality check, the VCFs underwent a step-wise filtering process by two independent trained researchers that were both blind to the phenotype and the original mutation of each sample. The first step of the filtering process involved excluding intronic variants, synonymous variants, or variants present in international and/or local databases (Saudi Human Genome Program database-SHGP) (with MAF > 1%). In the second step, variants' deleteriousness was assessed using three different prediction software (SIFT [24], PolyPhen-2 [25] and Mutation Taster [26]). Those predicted as "tolerated", "neutral", or "benign" were removed. In most cases only homozygous changes were selected, yielding a maximum of three variants/sample. Finally, the causative pathogenic variant was selected from the shortlist of variants, according to the clinical diagnosis. The final results were compared to the initial Sanger sequencing data to estimate the concordance.

Sequencing Quality, Coverage and Overall Panel Performance
Sequencing and read mapping quality of the runs are summarized in Table S1. The overall average number of reads at Q17 was about 3.06 million per run, with an average base yield of about 392 Mbp. The panel generates a total of 10,309 amplicons with an average read length of 135 bp at Q0 and 118 bp at Q20. On average, 95% of these reads were aligned to the target regions (the target set of genes). The overall average depth was 191×. Target base coverage at 1×, 20×, and 50× are 98.36%, 93.29%, and 89.34%, respectively (Table S1). Also, 95% of the amplicons were free of strand bias. The average coverage per gene was~98 (Table S2). Of the total amplicons, 253 (2.5%) had coverage less than 90%, and 105 (1%), from 78 genes, had suboptimal performance (Table S3). Overall, this shows good sequencing parameters for the runs that were included in the analysis.

Variant Calling
For the 88 samples that were analyzed, an average of 1058 variants per sample were called and annotated. Recalling that the total length of the target region is 1,509,563, this translates to seven variants per 10,000 bases per sample. For the combined set of samples, the total number of unique variants was 11,118, with a rate of 7.36 variants per 1000 bases.

Variant Detection Yield
Validation yield was determined by evaluating the concordance between the original mutation and the NGS results for the corresponding case. However, two of the samples were excluded from the analysis due to either amplicon failure or design. Using our analysis pipeline, the IDP successfully detected 93.1% (97.3% for SNVs and 69.2% for Indels) of the mutations in our validation cohort ( Table 2). Of note, the detection rate when including the two failed samples is 91%, (96% for SNVs and 64.3% for Indels). In addition, mutations that were masked in Table 2 were revealed to the researchers for the purpose of the analysis. Moreover, missing variants were attributed to low coverage, or the homopolymer effect (Table S4).

Discussion
In this study, we report the successful validation of IDP as a comprehensive and sensitive assay for detecting causal mutations in a variety of inherited diseases. Using IDP with PGM, we achieved (~98%) sequence coverage of the targeted regions, with an average depth of 191X. A total of (1058) variants were detected in each sample before filtration. We were able to detect (93.1%) of the originally reported causal mutations. The remaining (6.9%) of causal variants were not detected due to the inadequate coverage of challenging DNA regions with homology or high GC-content (Table S4).
A wide collection of disease-focused or comprehensive gene panels for inherited diseases is commercially available and is being used in clinical laboratories with various NGS platforms [17,27]. Examples of comprehensive inherited disease panels, other than the one assessed here, include Otogenetics and TruSight Inherited Disease Panel (Table S5). The panel from Otogenetics comprises the largest number of genes (~4500), however, the subsequent data analysis and the interpretation could be very challenging and time consuming. Besides, this panel is available only as a service. On the other hand, both the IDP and TruSight are available as predesigned ready-to-use panels. The TruSight covers 552 genes focused only on severe recessive child-onset diseases, whereas the IDP surveys 328 genes that are implicated in > 700 child or adult-onset inherited diseases. Both IDP and TruSight offer fast time-to-results. Delivering accurate results in a short turn-around time is imperative for any diagnostic test, as results may impact the treatment decision or prevent unnecessary interventions.
When it comes to choosing the most appropriate genetic testing strategy, clinicians often face the challenge of deciding which NGS-based approach (targeted vs, Whole Exome Sequencing, WES) to pursue as first-tier genetic screening. Exome sequencing utilization in clinical settings (clinical exome sequencing) allows for the unbiased evaluation of roughly all 21,000 genes. This is crucial in situations of diagnostic uncertainty, in diseases with significant genetic and phenotypic heterogeneity or even to minimize the effect of diagnostic error as testing is not restricted to genes implicated in a certain disorder. One important advantage of exome sequencing is the capacity to identify alterations in both well characterized and novel genes, allowing for data re-analysis in the light of new gene-disease associations. Another advantage is that it can improve the management of patients by alerting physicians to unanticipated comorbidities that may alter the course of treatment or impact prognosis. On the other hand, technical limitations of this approach include incomplete gene coverage (especially in problematic regions), variant validation, and interpretation [28,29]. Another limitation to this approach is that it generates a long list of variants most of which are variants of uncertain significance (VUS) that are usually overlooked or are filtered out. Some of these variants could be clinically relevant (may represent actual mutations), however, unfeasible to validate. An additional ethical issue is, the disclosure of incidental/secondary findings, which are not uncommon [30,31].
Targeted gene panels have the potential to overcome some of the current exome sequencing limitations. They offer a superior coverage of up to 100% when coupled with Sanger sequencing [32]. Unlike exome sequencing, gene panel analysis generates substantially less variants, thus making the validation and interpretation much more efficient. This can minimize the chances of missing VUS with potential clinical relevance. More importantly, because the analysis is restricted to genes that are related to the primary clinical condition, the issue of incidental findings is reduced to a minimal concern in targeted-panels [27,33]. However, the major limitation of targeted-gene panels is that a panel could become obsolete if its content is not constantly updated to catch up with the fast pace of new genes discovery.
With regard to the choice of NGS-based genetic testing, there seems to be a general consensus on using targeted-panels as first-tier genetic testing, particularly for diseases with distinct phenotypes and a good knowledge of the underlying genes. On the other hand, exome or genome sequencing are recommended to be reserved for those cases in which molecular diagnosis could not be established via targeted-panel testing [34,35]. Recently, a targeted sequencing approach using 13 different gene panels covering the majority of OMIM reported genes, demonstrated a high degree of clinical sensitivity and specify providing evidence for the advantages of utilizing targeted-panels over exome sequencing as first-tier genetic testing approach [22].
Due to the random selection of samples, the performance of the IDP could not be evaluated for all disease categories and certainly not all phenotypes. Our samples represented various disease categories with the majority being classified as metabolic disorders (Figure 1a). However, it is important to note that half of the Mendelian conditions that are prevalent in Saudi Arabia (Thalassemia, lysosomal storage disorders, hearing loss, organic acidemias, and retinal dystrophies) are covered by the IDP [36].
The assay in its current format is not intended to identify copy number variants, however, it is possible to incorporate algorithms for assessing this type of alteration into the bioinformatics pipeline after preforming the necessary validation [37]. Sensitivity validation results for this panel met the high degree that is required for research use. However, additional important quality measures, such as (run-to-run or laboratory-to-laboratory) reproducibility, should be evaluated before clinical implementation of the assay [38].

Conclusions
This study demonstrated the suitability of the IDP as a rapid and comprehensive approach for screening a large number of genes that are responsible for over 700 different inherited diseases. It is worth mentioning that reported detection yield of gene panels for inherited diseases varies widely (24-95%), placing the rate achieved in our study at the upper range [22,23,[39][40][41][42] Inherited diseases are expected to be frequently encountered in consanguineous populations. For instance, in Saudi Arabia, inherited conditions, such as Thalassemia, lysosomal storage disorders, hearing loss, organic acidemias, and retinal dystrophies are common [36,43]. In response to that, the Ministry of Health established two national molecular screening programs; newborn and premarital [44,45]. The incorporation of a comprehensive gene panel (such as the IDP and other available panels [22]) as a second-tier testing approach into any ongoing public screening programs would enhance their performance by improving diagnostic accuracy and expanding the range of conditions for which screening is available.