CNV Detection from Exome Sequencing Data in Routine Diagnostics of Rare Genetic Disorders: Opportunities and Limitations

To assess the potential of detecting copy number variations (CNVs) directly from exome sequencing (ES) data in diagnostic settings, we developed a CNV-detection pipeline based on ExomeDepth software and applied it to ES data of 450 individuals. Initially, only CNVs affecting genes in the requested diagnostic gene panels were scored and tested against arrayCGH results. Pathogenic CNVs were detected in 18 individuals. Most detected CNVs were larger than 400 kb (11/18), but three individuals had small CNVs impacting one or a few exons only and were thus not detectable by arrayCGH. Conversely, two pathogenic CNVs were initially missed, as they impacted genes not included in the original gene panel analysed, and a third one was missed as it was in a poorly covered region. The overall combined diagnostic rate (SNVs + CNVs) in our cohort was 36%, with wide differences between clinical domains. We conclude that (1) the ES-based CNV pipeline detects efficiently large and small pathogenic CNVs, (2) the detection of CNV relies on uniformity of sequencing and good coverage, and (3) in patients who remain unsolved by the gene panel analysis, CNV analysis should be extended to all captured genes, as diagnostically relevant CNVs may occur everywhere in the genome.


Introduction
In clinical practice, the process of finding a molecular genetic diagnosis for rare genetic disorders is challenging. In spite of advances in laboratory technology in the last 10 years, approximately one-half to two-thirds of patients remain without a clear diagnosis, depending on their clinical manifestation [1][2][3]. Establishing a precise diagnosis is the first and important step to help the patient and his/her family, even in the absence of a specific therapy [4].
Human genetic disorders may arise from genetic variations that range in size from a whole chromosome down to a single-nucleotide variant (SNV). In between, a significant proportion of pathogenic variants is represented by sub-microscopic deletions/duplications of ∼50 nucleotides to thousands of nucleotides, that are collectively called copy number variations (CNVs). To detect CNVs, specific techniques such as genome-wide CNV detection (arrayCGH (aCGH)) or locus CNV detection (multiplex ligation-dependent probe amplification (MLPA)) are needed. They can detect CNVs either at a large scale (50 kb) or at the level of a single exon [5]. Optical genome mapping is a novel method allowing the detection with high accuracy of structural variants, particularly CNVs [6,7].
In routine diagnostics, there is no "perfect" genetic technology capable of the simultaneous detection of structural rearrangements, CNVs, SNVs, short-repeat expansions, etc. For this reason, multiple parallel or sequential investigations are often necessary. Guidelines have been set up for different clinical disease groups and diagnostic entities, suggesting specific diagnostic workflows that consider the feasibility, the costs, the conditions of reimbursement set by medical insurances, and the diagnostic yield [8,9].
For many conditions, such as congenital malformation syndromes or developmental disorders, it remains disputed whether the first-tier diagnostic tests should be a nextgeneration sequencing (NGS)-based gene panel or a microarray-based chromosome analyses (arrayCGH), which is less expensive but has a lower diagnostic yield [10].
The genetic analyses of rare Mendelian disorders have tremendously changed over the past decades. Most of the technologies have evolved towards faster and cheaper analyses with a higher resolution scale. This is particularly the case for NGS. Sequencing of the entire coding regions of the patient's genome-exome sequencing (ES)-has greatly progressed with libraries targeting exons of all genes simultaneously and with improved coverage and uniformity. These advancements, in combination with bioinformatics software improvements, have allowed direct CNV detection from ES raw data to become more and more precise [11][12][13]. Based on the coverage (reads-depth) of one patient compared to a set of patients sequenced at the same time and/or sequenced using the same conditions, NGS-based CNV software can detect regions with significantly more reads (gain of copies) or fewer reads (loss of copies) than the average [14].
Here, we implemented the detection and validation of pathogenic CNVs using ES data. We used ExomeDepth software [15] with default settings and subsequently parametrised and annotated the CNV results with an in-house bioinformatics pipeline to be used for ES-based CNV analysis in routine diagnostics. We applied our pipeline to 450 NGS patients sequenced between the years 2018 and 2020.

Patients Cohort
Patients recruited in this cohort were part of the diagnostic NGS routine, seen in the Genetic Laboratory of the Genetic Medical Service, at CHUV, Lausanne, between 2018 and 2020. DNA of the patients was extracted after appropriate informed consent, from whole blood for all postnatal cases (n = 440) and from the skin (n = 2), trophoblast (n = 1), or cultivated amniotic liquid (n = 7) for prenatal cases, with Blood DNA Kit on Maxwell ® 16 or Maxwell ® RSC. The median age of the postnatal patients was 22 years (ranging from 1 to 81 years), 237 were males and 203 were females. The ES-based CNVs pipeline was performed on ongoing diagnostic cases in parallel with the SNVs pipeline.

Exome and Targeted-Exome Sequencing
Exome sequencing was performed after obtaining patients' appropriate informed consent using the SureSelect V7 exome kit from Agilent (Agilent Technologies, Santa Clara, CA, USA) on an Illumina HiSeq 2500 at the Genomic Technology Facility of the University of Lausanne. Targeted exome sequencing was carried out upon patients' appropriate informed consent using the TruSight One Expended (TsoE) kit from Illumina in our genetic laboratory on an Illumina NextSeq 500. The raw NGS reads were aligned to the human reference genome GRCh37 using Novoalign from Novocraft (http://www.novocraft.com, accessed on 5 August 2017). The data cleanup, followed by variant calling, was performed according to GATK Best Practices recommendations (https://gatk.broadinstitute.org/, accessed on 5 August 2016) as already described in [16]. SNVs were annotated using Annovar [17] in combination with in-house developed scripts. Following the Swiss Society of Medical Genetics (SSMG) guidelines (https://sgmg.ch/, accessed on 5 August 2015), NGS analysis in routine diagnostics is only targeting genetic alterations impacting genes clinically relevant to the patients' phenotype. Thus, only SNVs impacting the requested genes' panel were analysed, and they were subsequently filtered according to their quality, rarity, and impact on genes. SNV's final classification was carried out according to ACMG criteria [18].

Conventional CNV-Detection Methods-Microarray-Based Chromosome Analyses
Genome-wide CNV detection was performed using the Agilent microarray platforms (Agilent Technologies, Santa Clara, CA, USA): (1) Agilent SurePrint G3 Human CGH Microarray (4 × 180 K array) with an overall median probe spacing of 13 kb (11 kb in Refseq genes) and (2) Agilent SurePrint G3 Human (1 × 1 M High-Resolution Microarray) with an overall median probe spacing of 2.6 kb. Microarray processing was carried out according to the manufacturers' recommendations. The arrays were scanned using an Agilent Microarray Scanner (Agilent Technologies, Santa Clara, CA, USA) and analysed using Agilent Genomics Workbench Lite 6.5 software. In routine diagnostics, a hit is considered significant when a minimum of three probes are deviated consecutively. The global diagnostic resolution for a 180 K array is situated between 60 and 100 kb.

Conventional Locus CNV-Detection Method-MLPA
Locus-specific CNV detection was achieved using the following gene-specific MLPA probe mixes from MCR Holland (MCR Holland, Amsterdam, the Netherlands): P330-PCDH19 (PCDH19 gene), P033-CMT1 (PMP22 gene), P165-HSP (SPAST gene), and P461 DIS (STRC gene). MLPA analyses were carried out according to the manufacturers' recommendations and analysed using Coffalyser software (MCR Holland). In routine diagnostics, a hit is considered significant when a minimum of two probes are deviated consecutively. If a single probe is deviated, a second method is necessary to exclude a false-positive result due to an allelic dropout, often caused by the presence of an SNV in the probe-specific sequence (MCR-Holland).

NGS-Based CNV Calling, Annotation, and Filtering
CNV calling was performed using the ExomeDepth software [15] with default settings, per batch of patients sequenced at the same time on the same machine following the same bioinformatics procedures. For ES data, a batch is represented by 32 patients, while by 8 patients for the targeted exome TsoE.
For each library's target, the number of reads expected to be present in each patient was computed, based on the patient coverage as well as of the entire batch's coverage, and it was compared to the observed number of reads actually present in the sequenced data of each patient [15]. Regions with statistically fewer reads in a patient, compared to the rest of the batch, show a potential loss of copies, while regions that are statistically more covered represent a potential gain of copies. The difference in the coverage level help in assessing the ploidy of the CNV.
CNVs were annotated with an in-house pipeline, summarised in Table S1. The annotations contain the cytogenetic location of the CNVs and the distance from the beginning (pTer) or end (qTer) of the chromosome based on the number of library targets in those regions. The impacted genes were annotated with different information, including the RefGenes name and the number of exons per gene impacted by the CNV, the associated disease description from OMIM if existing (https://omim.org/, accessed on 11 September 2021), the presence or absence of a pseudogene, the DOMINO score [19] and their haploinsufficiency and triplosensitivity using ClinGen Dosage Sensitivity Map (https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/, accessed on 5 August 2020). The CNV annotation calculates the number of targets present in 180 K and 1 M arrayCGH libraries inside each CNV's boundaries. CNVs were merged together if they were distanced less than 3 targets away and with the same type of variation (gain or loss of copies) and ploidy, to avoid potential splitting of CNVs due to lower coverage in some targets. The frequency of each CNV was calculated in the batch of patients processed together and in an in-house database of CNVs containing results from previous batches.
Similar to SNVs analysis and in accordance with the Swiss Society of Medical Genetics guidelines, CNVs were filtered based on their overlap with genes present in the clinical panel(s) requested by the patient's geneticists and/or clinicians.

Pathogenic CNVs
CNVs detected by the above-described pipeline were considered likely pathogenic or pathogenic if they were having the following features: (1) rare or absent from the inhouse database of CNVs and in control populations (gnomAD SVs v2.1 [20]; Database of Genomic Variants build GRCh37; nstd102 Clinical Structural Variants (formerly ISCA)); (2) impacting gene(s) associated with diseases similar to the clinical description of the patient; (3) impacting a gene or a region harbouring similar reported pathogenic CNVs (DECIPHER [21]; Human Gene Mutation Database ® ), and according to Nowakowska CNV review [22], as well as ACMG and ClinGen recommendation [23]. For critical genes included in the detected CNVs, ClinGen Dosage Sensitivity Map was used to assess the haploinsufficiency and triplosensitivity of the different genes, as well as gnomAD gene metrics [20]. This is particularly important for the classification of unknown CNVs.

Samples Selection
In the first phase, we selected 5 patients with known pathogenic CNVs previously detected by arrayCGH or MLPA (Table S2), and we sequenced their exomes in a batch with 27 other undiagnosed patients. After processing the entire batch with ExomeDepth software, we only analysed the CNVs detected in those 5 patients in order to develop and parametrise the NGS-based CNV pipeline. In the second phase, we applied this pipeline to all patients to whom ES was performed in 2018 and 2019 (n = 199). All potentially pathogenic CNVs detected were then validated with arrayCGH/MLPA ( Table 1, Table S3), and the pipeline was fine-tuned at each new discovery. Finally, in the third phase, we applied the NGS-based CNV pipeline to all our NGS patients sequenced in 2020, including both ES (n = 93) and targeted-exome (n = 158) libraries ( Figure S1). No validation was required for CNVs detected at this step, unless atypical CNVs, such as quadruplication, were found. However, most CNVs were confirmed later on by arrayCGH/MLPA for familial segregation and genetic counselling (Table 1). Notably, before the inclusion of TsoE library kit in Phase 3, we tested the conformity with the ES library kit by comparing the SNV and CNV results between 10 samples sequenced with both libraries. The conformity tests were repeated annually with at least 4 samples to ensure the consistency of the results.

Disease Categories
The samples in this study were part of our routine diagnostic NGS analysis. Of the 450 patients processed with the NGS-based CNV pipeline, approximately half had a neurodevelopmental condition (n = 227), 63 had neurodegeneration disorder, 40 suffered from renal disease, 35 from cardiac problems, 20 had connective tissues disease, 19 had vision problems, 17 suffered from hearing loss, 20 patients had a diverse set of diseases, and the remaining categories had fewer number of patients (Figure 1b). Notably, some of these patients had multisystemic diseases and had, therefore, multiple clinical gene panels analysed simultaneously. The majority of patients had only one category of disease requested (n = 398, 88.4%), but 2 (n = 41, 9.1%) or 3 or more (n = 11, 2.4%) categories per patient were also analysed.

Diagnostic Yield
Of the 450 NGS individuals analysed for both SNV-and CNV-impacting genes in their requested clinical genes panels, we detected 162 individuals with diagnostic changes (SNVs and/or CNVs), giving an overall positive diagnostic rate of 36.0% (Figure 1a). A total of 230 cases remained negative, 58 patients had variants classified as a variant of unknown significance (VUS) in their molecular report, caused by SNVs only (n = 55) or by CNVs (n = 3).

Pathogenic CNVs per Disease Categories
The majority of pathogenic or likely pathogenic CNVs detected here were larger than 400 kb (n = 11/18, 61.1%)-cut-off used in prenatal arrayCGH testing-and were impacting multiple genes (Table S3). These CNVs were associated with neurodevelopmental phenotype (n = 8/11, 72.7%), neurodegenerative disease (n = 2/11, 18.2%), or renal disease (n = 1/11, 9.1%). They were all known deletions or duplications, detectable by various molecular and cytogenetic methods (Table 1). We also detected gene-size CNVs in four patients (n = 4/18, 22.2%). The last three patients (n = 3/18, 16.7%) had small CNVs of one or two exons in size. In total, CNVs in two patients would not have been detected by arrayCGH or MLPA, either because of their small size leading to an absence of arrayCGH targets or because of the unavailability of an MLPA kit targeting these exons (cases 3 and 13). The STRC heterozygous deletion in case 12 would not have been seen as well by arrayCGH, as there are few arrayCGH targets for the STRC gene due to the presence of a pseudogene [25]. However, few exons of STRC are targeted by MLPA, including some present in the heterozygous deletion harbored by the patient, which allowed us to confirm the CNV detected in our patient.

Comparison of arrayCGH and NGS CNV Results
In our cohort of 450 NGS patients, 120 had arrayCGH analysis conducted either before the NGS analysis (n = 87), during the NGS analysis to validate any interesting CNV (n = 15), or after the NGS analysis was found negative (n = 18).
Amongst the 87 negative arrayCGH cases performed before the NGS analysis, we reached a molecular diagnosis in 34 patients (39.1%), mostly due to pathogenic SNVs (n = 33/34, 97%). One patient referred because of global developmental delay and abnormal social behaviour (case 11) had a 1q21.1 microduplication originally annotated as a VUS in the arrayCGH results from 2013. The same microduplication was found during the NGS analysis, and the literature published after the original evaluation allowed this CNV to be reclassified as pathogenic [26][27][28] (Tables 1 and S3). From the negative NGS cases, 18 patients had an arrayCGH analysis performed after the NGS analysis. Amongst them, three patients (n = 3/18, 16.7%) were found to have a pathogenic CNV explaining the patient's phenotype and undetected by the NGS-based CNV pipeline (Table 2). These three cases are detailed in the next paragraph.

Pathogenic CNVs Undetected by the NGS Pipeline
For patient 19, suffering from lissencephaly, SNVs and CNVs applied to a neurodevelopmental panel of 1462 genes came back negative. Full-genome arrayCGH analysis was requested later on, and a heterozygous deletion of the UTR region until the intron 2 of PAFAH1B1 was found in the patient. Mutations in this gene have been associated with lissencephaly and subcortical laminar heterotopia, inherited in an autosomal dominant manner (MIM:601545, [29]). Interestingly, PAFAH1B1 was present in the NGS clinical panel analysed in our patient, but the deletion had not been detected by the CNV pipeline. Indeed, the first exon of PAFAH1B1 was outside the open reading frame of the gene, and it was not targeted by the NGS library used in the patient. The second exon of the gene was present in the ES library, but it was not highly covered in general, even in patients with the two normal copies of the gene. Manual inspection through the ExomeDepth raw data of the patient's batch showed a reduction in the average read coverage of exon 2 in our patient, compared to other patients sequenced similarly, but the overall difference was too small for ExomeDepth software to consider this single exon to harbour a heterozygous loss of copy ( Figure S2).
Patients 20 and 21 both had a negative NGS analysis targeting genes associated with neurodevelopmental disorders. Full-genome arrayCGH analysis was undertaken in a second step, and they both had a pathogenic deletion impacting NBEA and PACS2, respectively. Variants in the NBEA gene have been described in 23 patients with neurodevelopmental disorder with or without early onset generalised epilepsy (NEDEGE; MIM:619157) by Mulhern et al. in 2018 [30], but the gene has not been registered as a Morbid gene on OMIM until January 2021. PACS2 is also a gene described in 2018 by Olson et al. [31] as being causative of neonatal-onset developmental and/or epileptic encephalopathy, facial dysmorphism, and cerebellar dysgenesis, and similar to NBEA gene, it was not included at the time of the analysis in the clinical panel requested for the analysis of patients NGS data.

Variants of Unknown Significance in arrayCGH Results
In the arrayCGH results, 24 patients had CNVs classified as VUS, with the arrayCGH analysis conducted before the NGS analysis for most cases (n = 22/24, 91.7%). These CNVs were further investigated in the NGS results, and 19 of them were detected as well. From the five CNVs not detected, four were in regions without any NGS target, mostly large intronic or intergenic regions. The last undetected CNV, located in the coding region of the genome, was a polymorphic duplication, seen in the control population (gnomAD) and in other patients of the batch.
Globally, the CNVs found both in arrayCGH analysis and with the NGS pipeline are very similar in terms of the number of copies and in size with small differences at the borders of the CNVs (Tables 1 and S2). The boundaries of the detected CNVs depend on the targets present in the library used, both for arrayCGH and in NGS; the exact breakpoint is in general not known. Thus, the size of the CNVs detected by arrayCGH and the NGS pipeline corresponds to the minimal size of the event.
Interestingly, one case had a duplication in 1q23.3 present in mosaicism, and it was seen both in the arrayCGH results (50-55% of mosaicism) and in the NGS results (58% of mosaicism) (Table S4). The CNV was, however, classified as a VUS since it did not explain the phenotype of the patient.

Discussion
The gold standard tools for CNV detection in diagnostic settings are currently MLPA and arrayCGH analyses. The main advantage of an arrayCGH analysis is its genomewide resolution, allowing for the discovery of large gains and/or loss of DNA copies independently of any gene panel. On the other hand, arrayCGH technology is not targeting small CNV events involving one or a few exons [32]. Indeed, we observed that arrayCGH would have missed such small variations in cases 3, 13, and 17, as there were not enough arrayCGH targets around each of those CNVs to ensure their presence with good quality. Those small CNVs are the focus of MLPA analyses, which can be developed specifically for exons of genes known to be often affected by deletions/duplications, such as PKD1 or PKD2, causative of polycystic kidney disease [33]. However, some disorders including neurodegenerative or neurodevelopmental disorders have a large genetic heterogeneity, making the development of targeted MLPA for all the exons of all these genes unfeasible. Additionally, neither arrayCGH nor MLPA can screen for the presence of SNVs.
Exome sequencing overcomes some of these difficulties. First of all, it allows for simultaneous detection of both SNVs and CNVs, thus eliminating the need of using multiple different technologies in one patient, speeding up the diagnostic process. It also permits the identification of large and small CNVs at once (as previously detected by arrayCGH and MLPA, respectively). One of the diseases in which in our study we observed the highest increase in diagnostic yield owing to the CNV pipeline was hearing loss. Pathogenic variants in STRC are the second most frequent cause of hearing loss [34,35]. However, a high homology of more than 99% of the coding regions between STRC and its pSTRC pseudogene makes this gene difficult to study either by MLPA or Sanger sequencing, as it is very difficult to design appropriate MLPA targets and/or Sanger primers [25]. ArrayCGH is also not the best fitting validation method for the small deletions in STRC due to the paucity of arrayCGH targets in the STRC region (commercially available arrayCGH provides only one probe in the STRC region). NGS-based approaches have already been described as a viable and competitive alternative to classical methods for the detection of CNVs impacting STRC [36,37]. Moreover, in our experience with SNVs and CNVs detected in our STRC-positive patients, NGS with high quality of coverage (average coverage >100×) and proper library design was superior to all other techniques. In general, genes and regions of genes with high homology (e.g., SMN1, SMN2, HBA1, HBA2, IKBKG) will be difficult to target and to sequence, and the CNV ES-pipeline will not be efficient to detect CNVs impacting those regions [38].
In spite of its clear advantages, ES-based CNV detection is no silver bullet and comes with some limitations. Different factors affect the efficiency of ES-based CNV detection. First, batches of samples processed in a similar fashion are needed for the analysis. Second, the coverage of sequencing data needs to be homogeneous within and among samples, to allow differentiation between potential gain or loss of copies, and the technical variability of sequencing. The sequencing coverage is particularly important to distinguish true heterozygous deletions from low-covered regions [39,40]. This is a known issue particularly for the first exons of genes and for genes and exons in low-mappability regions [41]. As sequencing libraries and capture kits are constantly being improved, this issue may become less problematic in the next years [42]. In our laboratory, we added extra steps in the library preparation protocols with a verification of the patients' DNA quality (TapeStation ® from Agilent) and DNA quantity (Qubit ® from Invitrogen by Life Technologies and TapeStation) to ensure a maximum uniformity when the patients' DNAs were pooled together before sequencing.
A further limiting factor of ES-based CNV detection is that only CNVs impacting well-covered coding regions of genes present in the sequenced library will be detected. Intergenic and intronic CNVs are, by default, not visible with the ES-based CNV pipeline, as seen in our study when comparing the NGS and arrayCGH results. The CNVs boundaries detected by the ES-based CNV pipeline are, therefore, to be taken with caution, as most CNVs will likely start outside of the coding regions. Similarly, the breakpoints located in low-covered regions might also pass undetected.
In the current study, the ES-based CNV analysis was originally based on gene panels, as the SNVs analysis, and only variations impacting genes present in the requested gene panels were analysed. Given that more than 200 disease-associated genes are being discovered each year [43,44], it is challenging to keep the disease gene panels up to date, leading to CNVs being passed over, as seen in patients 20 and 21 (Table 2). In comparison, arrayCGH analysis is performed genome wide, and all CNVs impacting any gene would be examined. Yet, if the gene was not linked to a disease at the time of the analysis, the CNV would be at most classified as a VUS, as seen in case 11. Two options might be envisioned to allow for the detection of CNVs localised outside of the original gene panel. The CNV analysis could be opened to all the coding regions of the genome, similar to the genome-wide arrayCGH analysis, but this means an additional patient's consent for such analysis will be needed (at least in our institution and following the national Swiss SSMG guidelines). The second option could be the frequent reanalysis of CNVs data, similarly to the reanalysis of SNVs, which has shown great improvement in the diagnostic yield in recent years [45][46][47][48]. It is however a time-consuming task, and its reimbursement by medical insurances may be difficult to obtain.
Two of our cases highlight the possibility that CNVs may be the consequence of unsuspected chromosomal rearrangements. Thus, one might consider the necessity of obtaining additional studies (such as a karyotype or a FISH analysis), particularly for CNVs affecting genes localised in the pTer or qTer regions of the chromosome, as it might lead to a larger structural rearrangement (ring chromosome, chromosomal translocation, etc.) Similarly, a gain of copy can be an interstitial duplication but may also result from an unbalanced chromosomal translocation or a supernumerary chromosome (e.g., Prader-Willi-Angelman duplication/maternal 15q duplication syndrome [49]) (case 18). Therefore, the discussion of molecular CNV findings should include experienced cytogeneticists.

Conclusions
The NGS-based CNV pipeline allowed us to efficiently detect pathogenic CNVs based on raw NGS data. It (1) improved the overall diagnostic rate of genetic disorders by 5.9% in the population studied, including different disease entities, (2) removed the need for separate microarray-based chromosomal and MLPA analyses in some patients, (3) uncovered new CNVs that would not be detected by the aforementioned techniques, and (4) detected CNVs in patients for which microarray-based chromosomal or MLPA analyses are not the recommended genetic tests based on the clinical indication. However, one of the current limitations is the need for sequencing that is both uniform and with high coverage of genes, particularly for the first exon of genes. In its diagnostic application, ES-based CNV detection may exert its best efficacy when applied to the whole genome rather than to a restricted gene panel. It may also profit from periodic reanalysis of negative cases.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/genes12091427/s1, Figure S1: Study design and patients enrolled in each phase, Figure  S2: Box plots of raw read counts per exon (1-11) of PAFAH1B1 gene (NM_000430.4, GRCh37). The red dots represent the number of read counts in case 19, while grey dots represent the read-count value of samples from the same batch of 32 individuals, Table S1: Details of the CNV annotations in the pipeline, Table S2: List of patients from Phase 1, to parametrise the NGS-based CNV pipeline, Table S3: Details of the CNVs discovered by the ES-based CNV pipeline, Table S4: CNVs classified as VUS. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The genetic data supporting the findings of this study contain information that could compromise the privacy and/or consent of the participants. We, therefore, provide only the causative CNV variants' details in the main manuscript tables and/or Supplementary Tables S1-S4.
Acknowledgments: This study would not have been possible without the contribution of the genetic physicians of the Division of Genetic Medicine at the Lausanne University Hospital, as well as our excellent Colleagues from the other medical specialties.

Conflicts of Interest:
The authors declare no conflict of interest.