A Rapid PCR-Free Next-Generation Sequencing Method for the Detection of Copy Number Variations in Prenatal Samples

Next-generation sequencing (NGS) is emerging as a new method for the detection of clinically significant copy number variants (CNVs). In this study, we developed and validated rapid CNV-sequencing (rCNV-seq) for clinical application in prenatal diagnosis. Low-pass whole-genome sequencing was performed on PCR libraries prepared from amniocyte genomic DNA. From 10–40 ng of input DNA, PCR-free libraries consistently produced sequencing data with high unique read mapping ratios, low read redundancy, low coefficient of variation for all chromosomes and high genomic coverage. In validation studies, reliable and accurate CNV detection using PCR-free-based rCNV-seq was demonstrated for a range of common trisomies and sex chromosome aneuploidies as well as microdeletion and duplication syndromes. In reproducibility studies, CNV copy number and genomic intervals closely matched those defined by chromosome microarray analysis. Clinical testing of genomic DNA samples from 217 women referred for prenatal diagnosis identified eight samples (3.7%) with known chromosome disorders. We conclude that PCR-free-based rCNV-seq is a sensitive, specific, reproducible and efficient method that can be used in any NGS-based diagnostic laboratory for detection of clinically significant CNVs.


Introduction
There are over 100 recurrent chromosome disorders that affect the human population [1]. These syndromes are caused by copy number variations (CNVs) that include both whole chromosome and segmental aneuploidies that arise during gametogenesis or in the preimplantation period of embryo development [2]. While the large majority of fetal CNVs lead to early miscarriage [3,4], a small proportion are developmentally competent and are compatible with a livebirth outcome. Approximately 15% of all congenital abnormalities are associated with a pathogenic CNV [5].
In prenatal diagnosis practice, women with a high fetal risk for a chromosome abnormality are usually referred for invasive testing by amniocentesis at 15-16 weeks gestation or by chorionic villous sampling at 10-11 weeks gestation. High risk is generally inferred by either an abnormal maternal serum screening score, advanced maternal age or presence of a soft ultrasound marker or an ultrasound structural abnormality [5,9]. Invasive chromosome testing for aneuploidy is usually routinely performed by karyotyping, which has a CNV resolution of around 5-10 Mb in size [5]. For identification of smaller CNVs, chromosome microarray analysis (CMA) using high density SNP arrays is the most widely used method [10,11]. For both karyotyping and CMA, amniocytes are normally cultured to generate sufficient cells for analysis, and results are generally available after 2 weeks. More recently, next-generation sequencing (NGS) has been developed as an alternative method for detection of CNVs, with a detection resolution down to 0.1 Mb [12][13][14], which is sufficient to detect the vast majority of MMS. In a recent large cohort study of patients referred for invasive testing, we demonstrated that CNV-seq applied to amniocyte DNA samples can provide an increased yield of pathogenic CNVs compared to karyotyping [15]. More recently, in a study of over 1000 women referred for invasive testing, low-pass wholegenome sequencing performed similarly to CMA for detection of clinically significant CNVs [16].
In the field of molecular diagnostics, CMA is still the preferred methodology for identification of clinically significant CNVs in prenatal samples. To advance the clinical application of NGS, improvements in the sensitivity, specificity, reproducibility and versatility of NGS-based CNV detection methods are still needed before this technology can be generally considered for routine chromosome testing. One essential part of the NGS workflow is the library preparation method, with most CNV-seq tests currently using a low DNA input (50-200 ng) and a PCR step to amplify genomic fragments. As an alternative approach, we developed and validated a PCR-free-based rapid CNV-seq (rCNV-seq) method suitable for analysis of low nanograms amounts of genomic DNA from uncultured amniocytes. In a prospective study of 217 patients referred for chromosome testing, we show that rCNV-seq of the amniotic cell genomic DNA can reliably and accurately detect clinically significant CNVs.

Patient Samples
All blood and amniocentesis samples were collected at the Prenatal Care Unit of Peking Union Medical College Hospital (PUMCH). The research study was approved by the Ethics Committee for Drug Clinical Trials in PUMCH (approval number KS2019136), and written informed consent was obtained from each patient. For validation studies, peripheral blood samples (2 mL) were taken from patients with known chromosome disorders and genomic DNA (gDNA) extracted using AxyPrep Mag Tissue-Blood gDNA kit (Axygen, Corning NY, USA). For patients with a suspected chromosome abnormality due to indications such as advanced maternal age, an abnormal maternal serum screening result or a structural abnormality revealed by ultrasound, invasive testing was performed on amniocentesis samples (10 mL of amniotic fluid) obtained at 15-16 weeks gestation. The fluid was centrifuged, and gDNA from the amniocyte cell pellet was purified using the AxyPrep Mag Tissue-Blood gDNA kit. Prior to CNV testing, amniocyte DNA was checked for maternal cell contamination (MCC) using an STR-based semi quantitative PCR assay [17]; MCC levels of <5% were considered acceptable for clinical testing.

Construction of NGS Libraries
An overview of the three library construction methods used in this study is shown in Figure 1. All genomic samples for library construction were quantified using Qubit 3.0 (Invitrogen, Waltham, MA, USA). For PCR-free-frag library construction used by rCNVseq (Method 1), gDNA (10-40 ng) was initially treated by dsDNA fragmentase at 37 • C (NEBNext dsDNA Fragmentase, New England Biolabs, Ipswich, MA, USA) to produce smaller derivatives with an average size of~200 bps. Prepared gDNA fragments were then end-repaired, A-tailed and then ligated with barcoded sequencing adaptors using a proprietary DNA repair kit (KR2000, Berry Genomics, Beijing, China) to generate libraries for sequencing. For PCR-free-soni (Method 2) and PCR-soni (Method 3) used for commercial CNV-seq, gDNA (1 µg, PCR-free-soni; 100-200 ng PCR-soni) was sheared by sonification and fragments of 350 bps size selected on agarose gels using TruSeq DNA PCR-Free Low Throughput Library Prep Kit (Illumina, San Diego, CA, USA) and TruSeq Nano DNA Low Throughput Library Prep Kit (Illumina, San Diego, CA, USA), respectively ( Figure 1).

Copy Number Variation Sequencing and Data Analysis
Single end sequencing was performed on the NextSeq CN500 platform (Illumina, San Diego, CA, USA) with a run time of 6.5 h to generate approximately 5 million raw 45 bp reads per sample. Raw reads were then edited to remove artificial adaptor sequences, and the true 36 bp genome sequences were then mapped to the hg19 reference genome using the Burrows and Wheeler algorithm [18]. On average, approximately 2.8-3.2 million reads were uniquely mapped for data analysis. Reads were allocated to 20 kb bins along the length of each chromosome, and CNVs were identified from 24 chromosome copy number (CN) plots, as previously described [13,19]. Duplications were defined as CN >2.8, deletions CN <1.2, disomy (1.8 < CN < 2.2), mosaic trisomy (2.2 < CN < 2.8) and mosaic monosomy (1.2 < CN < 1.8).

Karyotyping
Cultured amniocytes were karyotyped by standard procedures [21]. Cytogenetic analysis of Giemsa-stained metaphase spreads was performed at a resolution of 320 bands.

Chromosome Microarray Analysis
CMA was performed using the CytoScan™ HD Array Kit (ThermoFisher Scientific, Waltham, MA, USA) according to the recommended protocol. The array contains more than 2.6 million SNPs and can detect copy number changes across the genome at a resolution of 25-50 kb. Genomic DNA samples were labeled and hybridized to the array according to the manufacturer's recommended protocol. Fluorescence signals were scanned using the GeneChip scanner (ThermoFisher Scientific, Waltham, MA, USA) and chromosome copy number changes called by Applied Biosystems™ GeneChip Command Console Software (version 3.2.2).

High Performance of PCR-Free-Based rCNV-Seq
The performance of PCR-free-frag-based libraries (rCNV-seq) and two control PCRfree-soni and PCR-soni libraries (CNV-seq) ( Figure 1) was bioinformatically assessed for 17 replicate normal female gDNA samples extracted from postnatal blood samples ( Figure 2). The percentage of uniquely mapped reads was significantly higher for PCRfree-frag (rCNV-seq) compared to PCR-free-soni (p < 0.0001) and PCR-soni (p < 0.0001) libraries. Both PCR-free-frag and PCR-free-soni libraries had a significantly lower read redundancy rate compared to the PCR-soni libraries (p < 0.0001). The coefficient of variation (CV) achieved with PCR-free-frag libraries was also lower across all chromosomes and for each individual chromosome, compared to PCR-free-soni and PCR-soni libraries. The median read sequencing depth, indicative of increased bin read numbers, was also higher for PCR-free-frag compared to PCR-free-soni and PCR-soni libraries. Further, the Guanidine/Cytosine (GC) content of the sequencing reads was higher for PCR-free-frag than PCR-free-soni. Lastly, the Q30 values were significantly higher for PCR-free-frag compared to PCR-free-soni (p < 0.0001) and PCR-soni (p < 0.0001). Based on evaluation of these key Quality Control (QC) sequencing indicators, rCNV-seq using PCR-free-frag libraries showed the highest performance values. We further assessed the impact of input gDNA amount on library yield, unique mapping ratio and redundancy for PCR-free rCNV-seq ( Figure 3). Mean DNA library concentrations were low (<200 p mole) at input DNA amounts <10 ng. Higher mean library yields suitable for sequencing were obtained with DNA input levels ranging from 50-800 ng. Despite different DNA inputs (1-800 ng), the unique mapping ratio was relatively stable at 60-65%. In terms of read redundancy, lower and higher DNA inputs were associated with a slightly higher ratio. Based on these assessment criteria, rCNV-seq provided high-quality sequencing data at input DNA amounts of ≥10 ng.

Validation of rCNV-Seq Using PCR-Free-Frag Library Preparations
Our newly designed rCNV-seq method based on PCR-free-frag libraries was further evaluated for the ability to detect known chromosome disorders, previously detected by CMA. These included single samples identified with trisomies T21, T18 and T13 and sex chromosome aneuploidies (SCAs) 45,XO, 47,XXX, 47,XXY and 47,XYY. In addition, we also selected MMS samples including three cases each of Cri du Chat and William-Beuren syndrome, two cases each of Wolf-Hirschhorn and Di George syndrome and one case each of Prader-Willi/Angelman, Smith-Magenis and Miller-Dieker syndrome. Further, we also included four additional samples with variants of uncertain significance (VOUS) involving CNVS < 1 Mb in size. In these experiments, we used a low DNA input of 40 ng as the starting template for PCR-free-frag library construction. Following rCNV-seq analysis of all samples, the specific CNV that was originally identified by CMA was also detected by rCNV-Seq (Table 1, Figure 4). In addition, the CNV intervals defined closely matched those defined by CMA and the expected CN prediction of one for deletions and three for duplications was observed. Taken together, our rCNV-seq protocol was highly sensitive for detecting the underlying causative CNVs associated with these MMS. In detailed reproducibility experiments using 10 replicate samples of 10ng for library construction, we further evaluated the reliability of rCNV-seq to correctly detect the precise CNV interval, benchmarking it against the CNV interval previously defined by CMA ( Figure 5). For 45,XO, one copy of chromosome X was clearly deleted (CN = 1) in all replicates. Likewise, for T21, the whole q arm of chromosome 21 was clearly duplicated

Diagnostic Performance of rCNV-Seq Using Prenatal Samples
To assess the diagnostic performance of rCNV-seq, 40 ng of amniocyte gDNA was analyzed from 217 pregnant women referred for chromosome testing for a variety of different clinical indications (Table 2). MCC checks confirmed that all amniocyte genomic DNA samples were >99% fetal DNA. In eight samples, rCNV-seq detected whole chromosome fetal aneuploidies, including T21 (n = 6), T18 (n = 1) and 47,XXX (n = 1). These aneuploidies were confirmed by follow up karyotyping ( Figure 6). There were no samples carrying pathogenic fetal CNVs associated with an MMS. However, eight samples were identified with small non-pathogenic duplications (0.96-1.81 Mb) that were classified as VOUS (Figure 7).

Discussion
In this study, we developed rCNV-seq based on a PCR library free sequencing methodology for detection of clinically significant CNVs in prenatal samples. The optimized rCNV-seq method accommodates input DNA sheared enzymatically without additional size selection steps and then combines the end repair and dA (adenine)-tailing into one step, removing the need for a PCR process and making library preparation much faster and easier (Figure 1). Equally important, library construction can be completed in a single tube, making the workflow more conducive to automation. In validation studies, rCNV-seq was highly sensitive and specific for the detection of common whole chromosome aneuploidies such as T21, T18, T13, 45,XO, 47,XXY, 47,XXX and 47,XYY as well as for segmental aneuploidies associated with different types of MMS. Detection of CNVs was highly reproducible, even when gDNA amounts as low as 10 ng were used to construct PCR libraries. In clinical studies of amniocentesis prenatal samples at risk for a fetal CNV, we also showed the ability of rCNV-seq to correctly identify both pathogenic and VOUS CNVs.
One of the main drivers for reliable and accurate calling of CNVs is the quality and depth of the sequencing data that are binned across each chromosome. In this regard, rCNVseq based on PCR-free libraries demonstrated improved performance over current CNV-seq methods, showing an average Q30 for the sequencing data as high as 94%. Compared to PCR, PCR-free-frag libraries achieved a higher proportion of uniquely mapped reads (fewer duplicate reads), a lower read redundancy and a smaller CV for all chromosomes, which are important parameters to improve the signal-to-noise ratio and allow more accurate detection of CNVs and their correct copy number. In addition, rCNV-seq exhibited some additional advantages for implementation of the method into molecular diagnostic laboratories with NGS capacity. Firstly, PCR-free libraries can be generated in around 2 h compared to 4-5 h for current PCR-free and PCR-dependent commercial methods. Thus, from sample receipt in the morning and with a sequencing run time of 6.5 h on the Next-Seq platform, it is possible to generate same day results and improves the overall efficiency of laboratory workflow and staff time. Second, the method can generate representative libraries from a wide range of input DNA amounts, improving versatility for the analysis of different sample types and compromised samples where the gDNA is limiting.
Benchmarking against CMA, rCNV-seq demonstrated a high degree of accuracy and reproducibility. For the majority of CNVs tested, there was a high concordance between the two methods for correctly calling the CNV interval and copy number. There were some exceptions where the CNV interval varied up to 33% for the genome location called. These differences are probably related to the inherent limitations of both techniques, whereby CMA has reduced probe coverage for some genome positions, whereas rCNV-seq has reduced coverage in highly repetitive regions. While high sensitivity and specificity is an important parameter for calling CNVs, reproducibility is also a key factor for defining the resolution of CNV detection, since each rCNV-seq analysis is based on a set of randomly mapped sequencing reads mapped across the 24 chromosomes. Previous studies have shown that while CNVs as small as 0.1 Mb can be detected by CNV-seq [13], there were no data provided on the reproducibility of detection. In this study, using gDNA inputs of 10 ng, high reproducibility of CNV detection was demonstrated for CNVs as large as 31.9 Mb to as small as 0.7 Mb. Nonetheless, further studies on a range of smaller CNVs are still needed to determine the minimum resolution of rCNV-seq for a single-pass diagnostic test.
Based on our findings, rCNV-seq is also suitable as an alternative to CMA for routine prenatal testing where a gDNA sample is generally available from cultured amniocytes. However, there are also clinical situations whereby rCNV-seq has some clear advantages over CMA. Firstly, some women are referred for an invasive chromosome test close to the legal limit for termination of pregnancy, which is usually 20 weeks in most countries [22]. Secondly, the amniocentesis samples collected can be substandard, lacking sufficient fetal cells for analysis and, occasionally, there can be cell culture failure [23]. In these two scenarios, it may not be possible to use CMA methods, because they rely on cultured amniocytes (one to two weeks) and a minimum of 0.5-1 µg of input DNA for analysis.
In these urgent clinical situations, it will be possible to use rCNV-seq to achieve a rapid and accurate diagnosis using either low amounts of gDNA from uncultured amniocytes or even the available cell free fetal DNA, which has been previously shown to be an ideal NGS template for CNV detection in pregnancies with abnormal fetal ultrasound structural abnormalities [24].

Conclusions
PCR-free-based rCNV-seq is a rapid, sensitive, specific and efficient method that can be used in any NGS-based diagnostic laboratory to achieve a reliable and accurate CNV diagnosis. Further studies on a large number of amniocentesis samples are needed to define the true clinical utility and versatility of rCNV-seq for prenatal chromosome testing.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.
Conflicts of Interest: X.C., C.L., M.X., and D.S.C. are employees of Berry Genomics. None have stock or bond holdings.