Maternal Copy Number Imbalances in Non-Invasive Prenatal Testing: Do They Matter?

Non-invasive prenatal testing (NIPT) has become a routine practice in screening for common aneuploidies of chromosomes 21, 18, and 13 and gonosomes X and Y in fetuses worldwide since 2015 and has even expanded to include smaller subchromosomal events. In fact, the fetal fraction represents only a small proportion of cell-free DNA on a predominant background of maternal DNA. Unlike fetal findings that have to be confirmed using invasive testing, it has been well documented that NIPT provides information on maternal mosaicism, occult malignancies, and hidden health conditions due to copy number variations (CNVs) with diagnostic resolution. Although large duplications or deletions associated with certain medical conditions or syndromes are usually well recognized and easy to interpret, very little is known about small, relatively common copy number variations on the order of a few hundred kilobases and their potential impact on human health. We analyzed data from 6422 NIPT patient samples with a CNV detection resolution of 200 kb for the maternal genome and identified 942 distinct CNVs; 328 occurred repeatedly. We defined them as multiple occurring variants (MOVs). We scrutinized the most common ones, compared them with frequencies in the gnomAD SVs v2.1, dbVar, and DGV population databases, and analyzed them with an emphasis on genomic content and potential association with specific phenotypes.


Introduction
Since 2013, NIPT has advanced from investigating common trisomies to reaching the subchromosomal or even monogenic levels for certain medical conditions [1,2]. The reporting of CNV imbalances in terms of duplications and deletions as incidental findings made through NIPT varies widely among different NIPT providers. Most providers disclose CNVs that are expected to be clinically relevant and potentially actionable. The positive predictive value (PPV) for CNVs is significantly lower than that of common trisomies [3,4], but it is still higher than that of sex chromosomal aneuploidies, particularly monosomy X [5]. Since the vast majority of cell-free (cf) DNA is derived from the maternal genome, genome-wide approaches primarily reveal maternal imbalances with much higher resolution than fetal imbalances, even with low-coverage genomic sequencing. Although NIPT was initially developed to detect medical conditions of the fetus, it eventually came to represent a liquid biopsy of the mother on a diagnostic level. Although most maternal CNVs can be considered common and benign according to ACMG criteria or in silico CNV prediction tools, they can also provide insight into the mechanisms of multifactorial diseases and potentially have clinical relevance. Globally collected NIPT data could be used for genome-wide association studies (GWAS) in large-scale epidemiological studies. Benign CNVs have been proposed as major factors responsible for human diversity. Moreover, it has been recognized that CNVs can even affect the transcriptional activity and translational levels of adjacent genes [6]. It is therefore possible for CNVs that were initially considered benign to later be proven to increase susceptibility to multifactorial diseases or cause genetic diseases with late onset or incomplete penetrance. Clinical variability could also be explained in part by other genetic or environmental determinants, modifying factors of other genes, multigenic inheritance, imprinting, and unmasking of recessive genes. In 2015, Zarrei et al. compiled a CNV map of the human genome and estimated that 4.8-9.5% of the human genome consists of CNVs; they further identified approximately 100 genes whose loss is not associated with any severe consequences [7]. Previously, most CNVfocused population studies have been conducted on clinically enriched populations with various conditions, such as cancer, obesity, idiopathic male infertility, Alzheimer's disease, schizophrenia, epilepsy, intellectual disability, autism, and even prion diseases [1,[8][9][10]. To our knowledge, genome-wide analyses of CNVs in large, healthy populations are still insufficient or lacking, unlike genomic variation studies focused on single-nucleotide variations (SNVs). The vast majority of CNV data is derived from individuals of European descent residing in Western countries, which may account for the underestimation of genomic variants in other populations [11].
Although there are certainly many relatively common CNVs, herein we present a detailed analysis of the 20 most frequently observed maternal CNVs larger than 200 kb (average size 431 kb, median size 340 kb) in a cohort of pregnant women analyzed via genome-wide NIPT. These data were compared with publicly available databases, including gnomAD SVs v2.1, dbVar, and DGB, especially concerning European non-Finnish populations. A subset of findings of unknown or conflicting significance was assessed, with an emphasis on genomic content. Our overview has been focused on a Central/Eastern European population (Slovakia, Czech Republic, and Hungary).

Patients and Sample Collection
Pregnant women who underwent NIPT as either first-tier or second-tier screening starting from the 11th gestational week were considered for this study. All participating women were recruited from prenatal obstetric centers across Slovakia, the Czech Republic, and Hungary between 2016 and 2019. Twin pregnancies were not excluded. All women provided signed informed consent for inclusion in the study before participation. It was anticipated that all participants would be clinically healthy or at least without known genetic abnormalities at the time of pregnancy. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of the Bratislava Self-Governing Region on 30 June 2015 (03899/2015/HF). The results, unless truly pathogenic, were not disclosed to the participants.
Ten milliliters of maternal peripheral blood was collected into a blood tube containing EDTA or a Cell-Free DNA BCT tube (STRECK, La Vista, NE, USA). Plasma was prepared within 36 h after collection (a longer time was acceptable for STRECK) using a two-step centrifugation protocol. The whole blood sample was first centrifuged at 1600× g for 10 min at 4 • C, followed by a subsequent centrifugation step at 16,000× g for 10 min. All subsequent molecular tests, including cell-free DNA isolation, modified genomic library preparation with Illumina TruSeq Nano chemistry, and DNA sequencing, were performed as previously described [12].

Bioinformatic Analysis for CNVs
Sequencing reads were aligned to the hg19 reference (NCBI build 37) using the Bowtie2 algorithm [13]. Read counts were collected per 20 kb bin. Then, two-step normalization was applied, which included locally weighted scatterplot smoothing (LOESS) [14] and PCA normalization to remove higher-order population artifacts on autosomes [15]. Finally, the signal was split into regions with equal-level signals using the circular binary segmentation algorithm from the R package DNA copy [16]. The resulting data were visualized using an in-house CNV caller tool [17] (Figure 1). These figures were automatically generated for each chromosome, including X and Y.
Sequencing reads were aligned to the hg19 reference (NCBI build 37) using the Bow-tie2 algorithm [13]. Read counts were collected per 20 kb bin. Then, two-step normalization was applied, which included locally weighted scatterplot smoothing (LOESS) [14] and PCA normalization to remove higher-order population artifacts on autosomes [15]. Finally, the signal was split into regions with equal-level signals using the circular binary segmentation algorithm from the R package DNA copy [16]. The resulting data were visualized using an in-house CNV caller tool [17] (Figure 1). These figures were automatically generated for each chromosome, including X and Y. The light gray vertical band depicts an unmappable centromere region. Black horizontal bands signify bins that did not pass quality metrics (centromere) and were thus excluded from the analysis. The approximated z-score for CNV is displayed over the magenta segment.
The average sequencing depth of our NIPT method for each sample was between 0.12 and 0.5×. A minimal number of sequencing reads of no less than 5 million with no upper limit per sample was obtained using a middle-throughput NextSeq 500/550 sequencer (Illumina, CA, USA), primarily for the analysis of fetal aneuploidy. Segments longer than 200 kbp with abnormal gain or loss with signal deviation exceeding 75% were designated as maternal and annotated using DECIPHER [18] and the X-CNV tool [19]. Each CNV call of genome assembly GRCh37 (hg19) was lifted over to hg38 using a webbased tool [20] featured in the UCSC Genome Browser [21]. Hg38 coordinates were recorded to DECIPHER. Black horizontal bands signify bins that did not pass quality metrics (centromere) and were thus excluded from the analysis. The approximated z-score for CNV is displayed over the magenta segment.
The average sequencing depth of our NIPT method for each sample was between 0.12 and 0.5×. A minimal number of sequencing reads of no less than 5 million with no upper limit per sample was obtained using a middle-throughput NextSeq 500/550 sequencer (Illumina, CA, USA), primarily for the analysis of fetal aneuploidy. Segments longer than 200 kbp with abnormal gain or loss with signal deviation exceeding 75% were designated as maternal and annotated using DECIPHER [18] and the X-CNV tool [19]. Each CNV call of genome assembly GRCh37 (hg19) was lifted over to hg38 using a web-based tool [20] featured in the UCSC Genome Browser [21]. Hg38 coordinates were recorded to DECIPHER.

Statistical Analysis
The presence of MOVs in our dataset was compared to population frequencies in the merged databases dbVar/DGV [19] and gnomAD SVs v2.1 [22] for non-Finnish European populations and then evaluated using chi-squared statistical tests. DGV and gnomAD variants represent a curated set of variants from selected studies with high resolution and quality evaluated for accuracy and sensitivity. Therefore, an overlap with DGV and gnomAD variants indicated that our CNV calls were likely to be true positives. For two particular variants, only partial overlaps were found in gnomAD, and a test was performed for cases when these were counted as matches.
The number of observed variants per megabase was calculated for each chromosome. The effective lengths of chromosomes (excluding unmappable regions) were used for this calculation. Normal distribution of this value across chromosomes was assumed, and the mean and standard deviation were estimated. The probability of the value for each chromosome was assessed with respect to the distribution, and potential outliers (chromosomes with a significantly different ratio of variants per megabase) were identified.
We assessed genome variability according to the size of each chromosome, but size did not seem to be significant. On the other hand, there seemed to be a correlation between chromosomal size and the number of variants, except for chromosome 19, which had fewer variants relative to its size (Table 3).

Discussion
We identified the frequencies of CNVs detected in the selected Central/Eastern European countries and compared them to the CNV frequencies of non-Finnish European populations in the genomic variants databases (DGV and dbVar) and the gnomAD SVs v2.1 database. The sizes of the respective cohorts were comparable for the purposes of this paper: 11,222, 7624, and 6422 for DGV, gnomAD, and our laboratory in-house database, respectively. When choosing population databases, it is necessary to consider that different methods for CNV detection could lead to varying sizes of identical CNVs [28]. DGV was created in 2004 as a comprehensive catalog of human-contained data of array comparative genomic hybridization (aCGH), with the gradual addition of sequencing data, while gno-mAD SVs v2.1 was introduced later and is solely based on high-throughput sequencing. We applied merged population databases, with a virtual unification of CNVs in DGV and dbVar coming from different genomic platforms, as described in detail in Zhang et al. (2021).
The MOV that recurred most often in our dataset was the 6q27 duplication (1.96%), whose frequency was not significantly different from that found among non-Finnish Europeans in the dbVar/DGB database (3.2%) and gnomAD SVs v2.1 (1.4%). Thus far, the three coding genes AFDN, FRMD1, and KIF25 in this locus are not known to be associated with any particular condition. Nevertheless, the genomic region contains at least 141 regulatory elements (enhancers, promoters, TF binding sites, etc.); hence, the likely gain is of unclear phenotypical effect (VUS) rather than benign consequences.
The 22q11.22 region overlaps two protein-coding genes, PPM1F (OMIM 619309) and TOP3B (OMIM 603582), and more than 30 other genes, mainly from the immunoglobulin lambda variable (IGLV) family of functional genes, pseudogenes, and vestigial sequences interspersed in the IGLV locus [29]. We hypothesized that this polymorphic MOV might be responsible for specific antibody diversity in our region. However, further functional investigations are necessary to confirm this hypothesis. We did not register any overlap with gnomAD SVs v2.1, and we anticipated sequence artifacts, but after all, it is a relatively common MOV in the non-Finnish European population in the DGV database (0.7%).
The region of chromosome 15q11-13 is susceptible to genomic rearrangements. Its genomic instability has been attributed to a high density of low-copy number repeats (LCR) mediating aberrant interchromosomal exchanges during meiosis by non-allelic homologous recombination (NAHR) [30].
Small duplications of 15q15.3 containing the CHRNA7 and OTUD7A genes have been reported to yield highly variable phenotypes, ranging from normal to various neurological manifestations. Microduplications of 15q13.3 are equally prevalent in clinical cases and in the general population, making it difficult to assess their contribution to pathogenicity [31]. Speech delay, autistic behaviors, and muscle hypotonia occur in all affected patients, whereas intellectual disability, developmental delay, and epileptic seizures are less common (60%). The cholinergic nicotinic receptor subunit alpha 7 (CHRNA7) gene is a clear candidate for behavioral disorders [24]. Our observations indicate that there is a significant difference between our prevalence (0.6%) and the prevalence of the European gnomAD (0.23%); however, the difference between our prevalence and that found in dbVar/DGV (0.8%) is less notable.
Identically frequent duplication and threefold larger reciprocal deletion of the 4q35.2 region encompassing the three genes ZFP42 (OMIM 614572), TRIML2, and TRIML1 (none of which is morbid) are not associated with any known clinical condition. Surprisingly, they were relatively common in our cohort, while there were no matches in the dbVar and DGV population databases. Moreover, the phenomenon is extremely rare in all gnomAD SVs subpopulations. Applying the ACMG guidelines and the X-CNV predictor, this MOV has been determined to be a variant of unknown significance; however, it has an abundance of regulatory elements. We speculate that both are candidates for population-specific variants; unfortunately, there is not yet any further knowledge regarding medical consequences or perhaps being an adaptive trait.
We recorded no matches in the above-mentioned databases concerning dup 12q24.13-q24.21 (hg19: chr12:114,260,000-14,540,000) and dup 3p26.3 (hg19: chr3:2,660,000-3,020,000). Additionally, 3p26.3 contains the dosage-sensitive gene CNTN4 (OMIM 607280), recently recognized as a risk factor for autism spectrum disorder and other neuropsychiatric disorders of unknown etiology [32,33]. Larger cohorts are needed for comprehensive and unbiased phenotyping and molecular characterization that may lead to a better understanding of the underlying mechanisms of reduced penetrance, variable expressivity, and potential parent-of-origin effect of copy number variations encompassing CNTN4.
Duplication of the 19q13.41 locus contains three OMIM genes. ZNF350 is an important gene in human mammary oncogenesis that interacts with BRCA1. Missense polymorphisms can influence the transcriptional activity of the tumor suppressor gene BRCA1, increasing a woman's risk of developing breast cancer [34]. On the other hand, other studies have shown that overexpression of ZNF350 significantly impaired the migration of tumor cells in colorectal cancer and inhibited growth and metastatic activity in cervical cancer, acting as a potent tumor suppressor in different types of cancer [35,36]. FPR1 and FPR2 are G-coupled receptors expressed in various immune cells that are involved in many pathological processes. Several genes from the zinc finger family, including ZNF577, ZNF649, ZNF615, ZNF614, and ZNF432, are under intense investigation due to their altered methylation status in various cancer types [37][38][39].
According to genomic content, dup/del 15q11.2 in our dataset is potentially clinically relevant; however, interpretation is rather conflicting due to low penetrance and reduced expressivity. 15q11.2 is a region prone to chromosomal rearrangement in terms of gains and losses. An Israeli collective led by Maya et al. assessed the overall prevalence vs. penetrance in prenatal and postnatal clinically indicated cases and observed no significant difference between the indicated and healthy populations. Their frequencies compared to our observations are not significantly different for duplication (0.78% vs. 0.54%, respectively), unlike deletions, which were less frequent in our pregnant cohort (0.49% vs. 0.2%). Of course, our observation could be biased with respect to different methodologies used as well as the population's origin. Duplications occurred almost twice as often as deletions. However, the phenotypical penetrance was found to be only 1.16% for duplications and 2.18% for deletions [25]. The less frequent microdeletion 15q11.2, containing four protein-coding genes (NIPA1, NIPA2, CYFIP1, and TUBGCP5), is associated with Burnside-Butler syndrome (BBS), which partially overlaps with certain neurodevelopmental disorders, including Prader-Willi and Angelman syndrome [40]. Twice as common is the "inverted" BBS region, reciprocal to the deletion, which seems to be relatively common (0.54% vs. 0.3%) in DGV but has no match in gnomAD. Interestingly, we found no matches in gnomAD SVs v2.1 for both deletion and duplication of this region. However, the finding is well recognized and relatively frequent. We would expect to find these CNVs across all population databases because they occur even in healthy populations without any phenotype.
Considering that the frequency among carriers was almost 0.5% in this study, the Xp22.31 duplication involving four genes (PUDP, STS, VCX, and PNPLA4) appears to be quite common. Reciprocal deletion spanning the STS gene is associated with congenital X-linked ichthyosis (OMIM 308100), together with corneal opacities, cryptorchidism, cardiac arrhythmias, and higher rates of developmental and mood disorders, affecting mainly males [41][42][43]. The phenotypic spectrum of male and female carriers can differ significantly, probably due to X inactivation in females. While the phenotypes associated with deletions are reasonably well characterized, phenotypes with reciprocal duplications exhibit a wide range of medical and neurobehavioral disorders, including autism and cognitive impairment, as well as speech and language difficulties. Paradoxically, there is evidence that high levels of the steroid sulphatase have a minor protective effect against depressive/anxiety behavior via higher levels of steroid hormones in carrier individuals compared to unaffected controls. Female carriers are more likely to suffer from gastroesophageal reflux disease (without esophagitis) than sex-matched controls [44]. We cannot state with certainty that women in our region tend to suffer less from depressive behavior compared to women elsewhere due to the higher prevalence of dup Xp22.31, but it is nevertheless an interesting consideration.
6q26 contains a putative haploinsufficient gene PRKN (parkin RBR E3 ubiquitin protein ligase). Deletions involving the PRKN gene are associated with Parkinson's disease, autosomal recessive juvenile Parkinson's disease (OMIM600116), and autism spectrum disorder. The PRKN gene is located at the common FRA6E fragile site, and copy number variants, as well as single nucleotide variants, are frequently detected [45,46]. Since the vast majority of cases are recessive, there is still the possibility of unmasking a heterozygous mutation on the second allele.
The IMMP2L gene (inner mitochondrial membrane peptidase subunit 2) is implicated in Gilles de la Tourette syndrome (GTS, OMIM 137580), Alzheimer's disease, autism spectrum disorder (ASD), schizophrenia, and other neurodevelopmental disorders. GTS, a neuropsychiatric disorder manifested by repetitive involuntary motor and vocal tics that fluctuate in severity, occurs in 0.4% to 3.8% of the population worldwide [47]. GTS is thought to be inherited in an autosomal dominant manner with variable expression and reduced penetrance. Gross deletions at the exon level represent the most common type of mutation (~80%). The genotypic background of these neuropsychiatric disorders is highly heterogeneous; expression and the resulting phenotype may be influenced by environmental or even epigenetic factors through different methylation patterns [48]. IMMP2L might also regulate processes in the reproductive system. This hypothesis was confirmed by the identification of deletions of the IMMP2L gene in patients with primary infertility [49]. Unfortunately, regarding our cohort of pregnant women, we do not have information regarding their inability to conceive naturally, as this information is not mandatory. Nevertheless, it would be valuable to investigate whether women with IMMP2L gene deletions are overrepresented in the IVF population.
Thirty-seven MOVs (11%) were likely pathogenic/pathogenic according to X-CNV, whereas only four (1.2%) were classified identically according to ACMG automated guidelines. The rule-based ACMG guidelines are often considered the gold standard in variant classification. Unfortunately, they lack important information regarding variant segregation in families and specific phenotypes. Moreover, ACMG is also prone to subjectivity of the scoring person; therefore, we preferred artificial intelligence (AI) in the computational X-CNV tool to predict pathogenicity, integrating four categories of features: universal, coding region, noncoding, and genome-wide.
We were curious about the known CNVs in which expression follows the parent-oforigin pattern. Genomic imprinting is a classic epigenetic phenomenon that involves transcriptional silencing of one parental gene allele. There is only one such region of interest in our dataset: the 15q11.2 deletion involving the imprinted NIPA1, NIPA2, CYFIP1, and TUBGCP5 genes, which exhibit neurodevelopmental phenotypes, such as epilepsy, macrocephaly, and autism spectrum disorder, when inherited from the mother, while paternal deletions have been associated with congenital heart disease and abnormal muscle phenotypes [50].
The distribution of all structural variants and MOVs above 200 kb according to chromosomal size was quite even, except for chromosome 19, which seems to have lower variability compared to the other chromosomes. Paradoxically, chr19 has the highest gene density of all chromosomes and the highest GC content, which both predispose it to higher nucleotide variability within and between species [51,52]. However, given the higher gene density, any CNV interspersing this region would logically have a potentially harmful effect.

Conclusions
Building a CNV variant database using NIPT is a very accessible and convenient way of obtaining genomic data for genome-wide population analyses. Some variants may correlate with susceptibility to certain diseases or represent an evolutionary advantage in the context of adaptation to the environment.
Our dataset has the ambition of enriching the existing and any future population databases, which makes it a valuable building block for further research on the functioning of these variants, whether functional or biological, direct or indirect. We are aware that small maternal imbalances below 200 kb were not included in this study because they are beyond our resolution capabilities. We utilized NIPT analyses of pregnant women generated by low-coverage sequencing as a source of population data on recurrent CNVs (MOVs) without any additional costs. Compared to other frequently used population databases, recurrence often differs significantly, even when compared to non-Finnish European populations. Therefore, some MOVs seem to be candidates for population-specific CNVs associated with the Central/Eastern European region. Based on the mappable chromosomal size, we did not record any significant differences in the occurrence of MOVs, except for chromosome 19. Some MOVs potentially have medical consequences, although they may have low penetrance and expressivity.  Funding: This article was created with the support of the OP Integrated Infrastructure for the project: Serious diseases of civilization and COVID-19, ITMS: 313011AVH7, co-financed by the European Regional Development Fund.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of the Bratislava Self-Governing Region on 30 June 2015 (03899/2015/HF).