A Whole Germline BRCA2 Gene Deletion: How to Learn from CNV In Silico Analysis

BRCA1/2 screening in Hereditary Breast and Ovarian Syndrome (HBOC) is an essential step for effective patients’ management. Next-Generation Sequencing (NGS) can rapidly provide high throughput and reliable information about the qualitative and quantitative status of tumor-associated genes. Straightforwardly, bioinformatics methods play a key role in molecular diagnostics pipelines. BRCA1/2 genes were evaluated with our NGS workflow, coupled with Multiplex Amplicon Quantification (MAQ) and Multiplex Ligation-dependent Probe Amplification (MLPA) assays. Variant calling was performed on Amplicon Suite, while Copy Number Variant (CNV) prediction by in house and commercial CNV tools, before confirmatory MAQ/MLPA testing. The germline profile of BRCA genes revealed a unique HBOC pattern. Although variant calling analysis pinpointed heterozygote and homozygote polymorphisms on BRCA1 and BRCA2, respectively, the CNV predicted by our script suggested two conflicting interpretations: BRCA1 duplication and/or BRCA2 deletion. Our commercial software reported a BRCA1 duplication, in contrast with variant calling results. Finally, the MAQ/MLPA assays assessed a whole BRCA2 copy loss. In silico CNV analysis is a time and cost-saving procedure to powerfully identify possible Large Rearrangements using robust and efficient NGS pipelines. Our layout shows as bioinformatics algorithms alone cannot completely and correctly identify whole BRCA1/2 deletions/duplications. In particular, the complete deletion of an entire gene, like in our case, cannot be solved without alternative strategies as MLPA/MAQ. These findings support the crucial role of bioinformatics in deciphering pitfalls within NGS data analysis.


Introduction
Heterozygous germline mutations in either BRCA1 or BRCA2 that impair their normal function confer significantly elevated risks of breast, ovarian, and other cancers [1]. BRCA1 gene deletions are not frequent, accounting for only 5-10% of all germline mutations and these are probably even less common in BRCA2 [2]. A wide range of genetic alterations occurring in BRCA genes may lead to a truncated or functionless protein, as reported [3] in Breast Cancer Information Core (BIC) database: these are most frequently classified as small deletions or insertions, non-sense mutations and splice variants. Nevertheless, an increasing number of large genomic rearrangements (LGRs), not detectable by current polymerase chain reaction (PCR)-based methods, has been identified in these genes [4].
To investigate the BRCA1/2 germline status of our Hereditary and/or Sporadic Ovarian Cancer women, we have recently validated a massive parallel sequencing (MPS)-based pipeline [5]. The diagnosis of HBOC is made following molecular genetic testing in an individual or family with a germline BRCA1 or BRCA2 pathogenic variant [6]. Our molecular diagnostic workflow is based on BRCA MASTR Dx kit by Multiplicom, a In Vitro Diagnostic Medical Device in Europe (CE-IVD) validated kit that is widely used for the genetic screening of whole coding and exon-flanking regions of BRCA1 and BRCA2 genes on Illumina platforms. Notably, MPS protocols have increased reliably and rapidly the range of information achievable from these second generation sequencing assays. Consequently, well designed PCR-based multiplex kit coupled with high-throughput bench-top hardware are now leading to a more accurate understanding of both qualitative and quantitative information concerning the germline status of many patients within a single run. We have implemented and internally validated our pipeline on MiSeq (Illumina, San Diego, CA, USA) that is now routinely used in our laboratory to genotype BRCA genes in more than two thousand patients per year. MiSeq system is a bench-top instrument that uses Sequencing-by-Synthesis (SBS) approach to generate up to 300 bp pair-end reads by means of solid-phase bridge amplification. The output ranges from 1 to 25 millions of reads per run, accounting for 540 Mega-base (Mb) up to 15 Giga-base (Gb). The DNA of each sample is tagged with specific sequences, or multiplex-identifiers (MIDs), prior pooling, and loading on the sequencer [7,8].
Since molecular diagnostic pipeline was specifically designed to guarantee quality control steps and confirmatory testing other than MPS assay throughout the entire workflow, Fragment Analysis (FA) has been early introduced to detect small indels within the PCR amplicons just before the MPS run. Furthermore, a set of in house bioinformatics tools was tailored to drive supplementary quality control and evaluation, encompassing the analysis of digital data that are generated throughout each analytical run. To detect pathological exon copy number changes within both genes among our tested patients, we started using Sophia DDM v3 commercial software. Finally, Sanger sequencing and both Multiplex Amplicon Quantification (MAQ) and Multiplex Ligation-dependent Probe Amplification (MLPA) analysis were performed as confirmatory tests for both variant calling and copy number variants, respectively. MLPA is a multiplex PCR method consisting of two adjacent probes that are linked by ligase when they hybridized on the target sequence. All of the ligated probes share identical sequences at their outermost ends permitting their simultaneous amplification using an unique primers pair. The amplification products are separated by capillary electrophoresis (CE) due to stuffer sequences of different lengths placed at the 3-prime end of one of each target-specific probe [9]. On the other hand, the MAQ method is based on multiplex PCR of several fluorescently labeled target and reference sequences. To detect copy number alterations, the fluorescent signals from the reference and target amplicons, obtained by test and control individuals, are evaluated after the fragment analysis performed on CE system [10,11].
To date, Nunziato et al. [12] recently published their single solution to easily address the germline status of a breast cancer patient carrying large BRCA2 exon 4-26 duplication. Therefore, the predictive algorithm used in their setup was able to match the correct CNV. On the other side, we underline that it is challenging to offer a single solution to address all of the genomic alterations occurring in routine analysis. As a matter of fact, an entire or very large gene duplication is straightforward to detect. In this targeted amplification strategy, where each gene turns into the reference for copy number evaluation of the spare one, this offsets and clarifies the opposite confidence in evaluating a duplication or a deletion of this extent. Nevertheless, we were able to correctly achieve the molecular diagnosis of whole BRCA2 gene deletion by means of the workflow described in this paper. Although this is an extremely rare scenario, we want to point out that commercial computational tools were not able to address the real copy number status of BRCA2 gene in the herein reported ovarian cancer case, where the complete deletion of BRCA2 gene was wrongly predicted as a complete duplication in BRCA1. We describe herein the complete laboratory setting of the workflow that is used to decipher and solve this issue.

Fragment Analysis (FA)
To check the performance of PCR-enriched bar-coded amplicon libraries on the samples amplified using BRCA MASTR Dx kit, five different plexes consisting of 93 amplicon generated on plex A (17 amplicons), plex B (20 amplicons), plex C (19 amplicons), plex D (18 amplicons), and plex E (19 amplicons), were fluorescently labeled and checked for semi-quantitative analysis on 3500 Sequence Analyzer (Appendix A).
Due to the lacking of internal CNV control in the BRCA MASTR Dx assay, the profiles of the amplicons representing both BRCA1 and BRCA2 library products were collected and compared with "True Negative CNV" (TN-CNV) dataset previously confirmed by MLPA/MAQ routine ( Figure 1a). To compare the electropherograms of each sample with TN-CNV, a curve-fitting algorithm was applied. Basically, a four-step Visual Basic for Applications (VBA) script was designed: (i) To import the electropherogram raw data of the sample, reference and standards for each plex; (ii) to convert the scan into base pairs data, applying a linear fitting to the datasets of the GeneScan 600 LIZ Size Standard (ThermoFisher, Waltham, MA, USA) assayed in the well of both the samples and reference; (iii) to identify and collect out-of-range amplicons, performing the gene normalization on the sample datasets. The peak heights of BRCA1 amplicons were used to normalize over reference and then the ratios of each BRCA2 amplicon were evaluated. A similar process was applied using BRCA2 heights as a reference: the results are then assembled and plotted; (iv) to generate an interactive plot representing sample against reference curves, featured with x-y axis adjustment controls. In this way, we were able to obtain a fast and fine alignment of each single amplicon in term of height ratios and base pairs distances.
Noteworthy, as shown in Figure 2a,b, a deletion of 2664 base pairs ranging from chr17:41256109 to chr17:41258773 (NC_00017.10, GRCh37.p13, Release 105) carried by a patient with complete deletion of BRCA1 exon 5, 6, and 7 (NM_007294.3, from c.135-223 to c.441+30) was early detected in the same experimental conditions (run data not reported). On the other hand, taking into account all five plexes, the interpretation of the candidate profile suggested two diametrically opposite conditions: the complete BRCA1 gene duplication as well as the whole BRCA2 deletion (Figure 2c,d).

MPS Analysis
The entire coding regions of the BRCA1 and BRCA2 genes were sequenced using BRCA MASTR Dx kit on MiSeq platform. MPS data were then analyzed to evaluate BRCA1 and BRCA2 amplicon read count (RC) ratios in each plex.
Based on our validated molecular diagnostic pipeline for BRCA genes alterations on MiSeq platform, the confidence intervals (CI99%) for each amplicon were already provided and herein applied in data filtering/checking during post-Next-Generation Sequencing (NGS) analysis step ( Figure 1b). Read Coverage analysis performed with our scripts reveals that global BRCA1/BRCA2 ratio in the five plexes of Sample 5 was extremely higher~1.3 and remarkably out of the range of 0.7 ± 0.2 (Mean ± DS) gathered from the other samples.
Moreover, coupled data from FA and RC analysis clearly suggested the needs of further CNV analysis to assess the real copy number status of BRCA genes. Thus, we performed an in silico CNV analysis after MiSeq run by means of commercial DDM bioinformatics tool provided by SOPHiA Genetics (SOPHiA Genetics, Saint-Sulpice, Switzerland). As expected, the CNV prediction data from SOPHiA DDM v3 matched with RC behavior in this sample as well as for the carrier of the deletion of 2664 base pairs, ranging from chr17:41256109 to chr17:41258773 (supplementary data: Figure S1).  Table A1, is reported. All of the amplicons with altered profile are flagged by an asterisk (b) BRCA1 and BRCA2 Read Coverage (RC) frequency values in all plexes of Sample 5, showing the relative amount of polymerase chain reaction (PCR) products belonging to BRCA1 amplicon and BRCA2 (red circles). The RC average and confidence intervals (CI99%) calculated using the run statistics (plotted as blue dashes and blue triangles, respectively).

MAQ Analysis
MAQ technique was able to detect BRCA2 complete deletion (Figure 3a-c). The panel b of the figure reports the electropherogram (Plex 1) of our patient where reduced peak heights of BRCA2 amplicons were present, as compared to those of wild type reference sample (shown in the Figure 3a).
All of the Dosage Quotient (DQ) values were about 0.5 for overall BRCA2 amplicons targeted by our method, as shown in the final DosPlot analysis, including four reference controls (Figure 3c). Finally, to confirm this result, we also used MLPA analysis, as reported below.

MLPA Analysis
MLPA test confirmed the complete BRCA2 deletion in our patient (Figure 3d-f). The panel e of the figure shows the patient's electropherogram with decreased peak heights of all the deleted BRCA2 exons; the obtained RPR (relative peak ratio) values (normal range 0.7-1.3) were about 0.5 (Figure 3f).

Discussion
Massive Parallel Sequencing technologies have increased the feasibility of molecular test screening in cancer diagnostics. Furthermore, several studies are ongoing in order to evaluate the benefits that are provided by this technology in routine diagnostics, above all in ovarian cancer BRCA1/2 assessment [13][14][15][16][17]. NGS platforms are able to provide reliable qualitative data: nevertheless, one of the main challenges regards its ability to precisely identify quantitative changes, like gain or loss in the genes investigated [18,19]. Allele status evaluation is becoming a powerful marker of genome instability, being one of the defected targeted by PARP-1 inhibitors. Therefore, both qualitative and quantitative complete BRCA1/2 NGS evaluation should be assessed to identify patients who can benefit from PARP-1 inhibitors treatment [20,21]. The strategies are able to detect larger deletions or duplications that may not directly translate from germline DNA to FFPE-derived DNA as consequence of several factors. Important features reflecting the limitations of tumor testing are mainly associated to smaller DNA fragment size, chemical modifications and chromosomal copy number changes, like aneuploidy, and the frequent instability of tumor samples. Consequently, these approaches can be really critical to achieve when BRCA1/2 testing is performed on both FFPE and fresh tumor level [17].
One of the main concerns regarding the bioinformatics tools is their ability to precisely predict copy number status, above all when a large alteration or rearrangement is present: these issues and pitfalls can be relatively more critical depending on type of starting materials (FFPE, rather than fresh or germline deriving DNAs).
In order to show one of the possible pitfalls occurring by using CNV bioinformatics prediction tools in the evaluation of BRCA1/2 allele status, we report herein an extremely rare case that is characterized by a complete deletion of BRCA2 genes found in our laboratory. At the beginning, our patient (namely, Sample 5) undergone QC test to verify the behavior of multiplex amplification products according to Multiplicom kit procedure. Consequently, we exported curve data and launched the VBA script to compared unknown sample against a pool of samples previously verified as being normal for BRCA1/2 copy number (TN-CNV). Therefore, during the analysis we were impressed due to atypical pattern that was contemporarily dealing with both BRCA1 duplication ( Figure 2c) and BRCA2 deletion (Figure 2d). The subsequent dataset that was obtained on MiSeq machine did not show any pathogenic variants on both genes: nevertheless, such BRCA1 polymorphisms were in heterozygous and/or homozygous, while 100% of BRCA2 variants resulted only in a homozygote status. Therefore, we evaluated the total read coverage of both BRCA1 and BRCA2 genes comparing these data with those obtained on the other samples processed within the same run. Clearly, BRCA1 amplicons appeared as overrepresented with an overall BRCA1/BRCA2 ratio (~1.3), which was twice when compared to that shown by the other samples. In order to decipher this feature, Sophia DDM software was used: unfortunately, the latter was still unable to correctly define CNV status on Sample 5 and the result obtain was inconclusive. We underline as the identification of small rearrangements was easily achieved with both BRCA MASTR Dx and Sophia DDM algorithms on other samples that were previously processed by our group. Not surprisingly, by means of MLPA and MAQ tools, we were not able to confirm the prediction given by Sophia DDM software. By contrast, we report herein the first case of entire BRCA2 gene deletion that is unique and rare in HBOC syndrome. A similar finding was recently published by Purshouse et al. [28] in the context of WGS analysis on prostate cancer tumor samples, where the whole BRCA2 deletion was exclusively somatic. Indeed, the present case regards the germline complete BRCA2 deletion occurring in a woman suffering from ovarian cancer in the context of HBOC syndrome.

DNA Samples
All of the subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol (Protocol ID: 0007205/16) was approved by the Ethics Committee of Università Cattolica del Sacro Cuore, Fondazione Policlinico Universitario Agostino Gemelli (Project ID: ESR14-10185, Approval date: 24/02/2016). Particularly, the patients carrying the whole BRCA2 deletion was 48 years women from Mediterranean Europe diagnosed with high grade serous ovarian carcinoma.
Genomic DNA was isolated from peripheral blood samples using High Pure PCR Template Preparation Kits (Roche Diagnostic, Indianapolis, IN, USA), eluted in 100 µL of Elution Buffer (Roche Diagnostic, Indianapolis, IN, USA), quantified by spectrophotometer at 260 nm, and stored at −20 • C until use. Only DNA meeting following requirements: OD260/280 ratio ≥1.7, concentration ≥15 ng/µL, no degradation signals visible on agarose gel, were used in the study.

MPS Analysis
The full coding sequences of BRCA1 and BRCA2 genes were PCR-enriched using the BRCA MASTR Dx v2.0 assay (Multiplicom, Niel, Belgium), according to the manufacturer's instructions. This diagnostic CE-IVD kit is fully compatible with reversible-terminator sequencing technology. Thus, the samples were assayed by means of library pipelines and ran on MiSeq (Illumina, San Diego, CA, USA) platform following our molecular diagnostic routine validated setting [5,20,21,29].
Before MPS run, fragment analysis (FA) was performed on Applied Biosystems 3500 Genetic Analyzer (Life Technologies, Carlsbad, CA, USA) as a Quality Control (QC) to test the enrichment efficacy, the amplicon elongation range, and the amount of primer dimers.
After MPS, the sequencing fastq.gz files from MiSeq were analyzed with a novel CE-IVD bioinformatics tool, namely Amplicon Suite (SmartSeq srl, Novara, Italy), to safely and rapidly investigate variant call (VC) and read count (RC) metrics of sequenced amplicons through its java-based client-server sftp service.
Two different in house scripts were also developed and applied: (i) To collect and analyze the FA profile of the five plexes concerning assayed bar-coded libraries. The extra files were generated and exported from Multiplex Amplicon Quantification (MAQ) software during amplicon library QC analysis. (ii) to deeply investigate read coverage data and PCR stoichiometry of the 93 amplicons belonging to each sample after each MiSeq run. The abovementioned scripts were early programmed in Visual Basic for Application 7 in Microsoft Office and they were completely rewritten and validated in a command line Unix environment by using a combination of bash, Python, Perl, and R software packages to fit to our analytical workflow requirements.
Copy Number Variant (CNV) analyses were also performed in silico on MiSeq dataset by means of commercial software Sophia DDM v3 (Sophia Genetics SA, Saint Sulpice, Switzerland).

MAQ Analysis
The MAQ v1.0 kit (Multiplicom, Niel, Belgium) was used according to reported instructions. Briefly, after DNA quantification, two steps were performed: PCR reaction and fragment analysis. In the first step, two multiplex PCR reactions were prepared for each patient: 20-50 ng of DNA were used in a final reaction volume of 15 µL, including 5 µL of Master reaction mix (Plex A or Plex B) and sterile distilled water. After 10 min to 98 • C, 23 cycles (95 • C 45 s, 60 • C 45 s and 68 • C 2 min) were performed with a final step to 72 • C for 10 min. Regarding fragment analysis, 2 µL of the MAQ PCR product was added to a well containing the size standard GS600 (Applied Biosystems, Warrington, UK) (0.3 µL) and 10 µL of HiDi-Formamide (Applied Biosystems, Warrington, UK). The run was performed on 3500 Genetic Analyser (Applied Biosystems, Warrington, UK) and MAQ-S v2.0 analysis software (Multiplicom, Niel, Belgium) was used for analysis results. Four healthy individuals were included in the analysis as reference controls.

MLPA Analysis
The SALSA MLPA kits for BRCA1 (P002-D1; MRC-Holland, Amsterdam, The Netherlands) and BRCA2 (P090-A4; MRC-Holland, Amsterdam, The Netherlands) were used for the relative quantification of all BRCA1/2 exons according to the manufacturer's instructions. Briefly, 100 ng of genomic DNA were denatured at 98 • C and hybridized with the specific MLPA probe mix at 60 • C overnight. After ligation reaction of annealed probes (54 • C for 15 min), the subsequent PCR reaction was performed for 35 cycles (30 s at 95 • C, 30 s at 60 • C, 60 s at 72 • C), with a final step at 72 • C for 20 min. 0.7 µL of amplification product were then mixed with 0.4 µL of the GS-600 Size Standard (Applied Biosystems, Warrington, UK), 10 µL of HiDi-Formamide (Applied Biosystems, Warrington, UK) and analyzed using an 3500 Genetic Analyzer (Applied Biosystems, Warrington, UK). The collected data were analyzed using Coffalyser.NET Software (MRC Holland, Amsterdam, The Netherlands). Three healthy males and three healthy females were included in the analysis as wild type controls.

Conclusions
We underline how the bioinformatics algorithms could be very powerful and useful in providing quantitative information regarding possible large rearrangements, above all when robust and well-designed NGS pipelines are set up. Nevertheless, PCR-based NGS assays cannot still correctly predict CNVs, with this task being challenging. Moreover, in silico analysis coupled with MPS platforms could be suitable for both qualitative and quantitative gene profiling (Figure 4). Herein, we emphasize as the risk of misinterpretation of patient's CNV status, performed by means of in silico tools, can be overcome through the deep knowledge of: (a) NGS chemistry biases; (b) complete pipeline design; and, (c) amplicon dynamics and behaviors within any run.
Our algorithm met this need since our pipeline was tailored using results obtained on thousands of samples consecutively processed with our validated diagnostic workflow: therefore, the pattern of any amplicon amplified within any plex was deeply characterized. This strategy could result as very promising for the identification pitfalls regarding copy number status assessment. This is the reason why laboratories providing BRCA1/2 NGS-based assays, should guarantee complete qualitative and quantitative analysis of both genes, in order to overcome these issues and facilitate patients' management. The codes of amplicons pooled in each plex were shortened. Reference pattern from Multiplicom documentation: BRCA MASTR TM GeneScan Profiles file is herein reported as a guide for amplicon identification.