Copy Number Variation: Methods and Clinical Applications

Gains and losses of large segments of genomic DNA, known as copy number variants (CNVs) gained considerable interest in clinical diagnostics lately, as particular forms may lead to inherited genetic diseases. In recent decades, researchers developed a wide variety of cytogenetic and molecular methods with different detection capabilities to detect clinically relevant CNVs. In this review, we summarize methodological progress from conventional approaches to current state of the art techniques capable of detecting CNVs from a few bases up to several megabases. Although the recent rapid progress of sequencing methods has enabled precise detection of CNVs, determining their functional effect on cellular and whole-body physiology remains a challenge. Here, we provide a comprehensive list of databases and bioinformatics tools that may serve as useful assets for researchers, laboratory diagnosticians, and clinical geneticists facing the challenge of CNV detection and interpretation.


Introduction
Among the least understood types of genetic variation are copy number variants (CNVs), a class of unbalanced structural variants characterized by deletions, insertions, duplications or even multiplications of DNA segments ranging in size from a few dozen of bp up to several Mb.Currently, the lower limit for CNV length is 50 bps, but this value has been gradually decreasing due to continuous methodological progress.The shift is mainly due to an increased resolution of used methods, allowing for detection of a wider variety of variant lengths and for an increase of CNV detection capacity (Figure 1).Considering this remarkable shift and the fact that generally used distinguishing criteria are somewhat vaguely defined, it has also been suggested that CNVs include a wider spectrum of variants.However, for practical reasons, we will focus on the conventional concept of CNVs in this review.
Pushing the limits of CNV detection revealed that they are widespread in human populations with a 5-10% difference of genomic sequences between normal individuals [1][2][3].As a significant aspect of our heterogeneity, CNVs may disrupt gene function or alter gene dosage by direct gain or loss of coding sequences [4], but several indirect mechanisms including alteration of non-coding RNAs [5,6] and topologically associated domains [7] have been described.Since these may affect the phenotype, CNVs may threaten the ability to survive or, on the contrary, enhance chances of survival in disadvantageous environments [8].CNVs are an important cause of genomic disorders with Mendelian inheritance, and may also contribute to complex diseases with multifactorial etiology [9].Since the introduction of high throughput technologies for CNV detection, namely array-based comparative genomic hybridization (aCGH) and massively parallel sequencing (MPS), the number of novel variants is constantly increasing.However, a lot of detected CNVs are still categorized as variants of uncertain significance (VUS) with unknown clinical impact [4], suggesting a need for their reliable classification.Therefore, all available information has been translated to the standards of interpretation and reporting of constitutional CNVs as recently published by the American College of Medical Genetics and Genomics and the Clinical Genome Resource [10].However, the age of high throughput technologies comes with an ever increasing amount of generated data.Researchers must continuously improve bioinformatic softwares and decision support tools to help clinicians handle the problem.

Methods of CNV Detection
From conventional cytogenetic methods through hybridization-and PCR-based techniques, up to MPS, CNV detection methods have been through a long evolution, affecting several aspects of progress in CNV research (Figure 1).Cytogenetic techniques were the first methods for CNV detection, based on visual inspection of chromosomes.Improvements led to the gradual lowering of detection limits, from numerical anomalies of whole chromosomes to CNVs of a few Mb in size.Introduction of molecular-biology methods, especially hybridization followed by Southern-blotting, allowed detection of mid-sized CNVs in the range of several kb.Later, amplification-based PCR methods together with their modifications and a wide range of associated detection techniques brought analytical resolution to single nucleotides, with upper limits of the detection range at hundreds of kb or a few Mb.Completely new possibilities were introduced by housing molecular hybridization techniques with cytogenetic methods and with microarray-based methods, but also with the invention of MPS.The latter two allowed the analysis of the whole size range of CNVs in single runs, at least theoretically, and in scales of whole genomes.

Cytogenetic Techniques and Their Most Common Modifications
Although chromosomes in plant and animal cells were first observed in the 19th century [11], and CNVs were microscopically detected in Drosophila in the early 20th century [12], it took the first half of the 20th century to assess the human diploid karyotype [13].This was finally allowed by several methodological improvements in karyotyping, leading to an establishment of conventional cytogenetic techniques which are still in general use.These include the use of cells cultured from the tested tissue, arresting of dividing cells in metaphase by colchicine, treatment by a hypotonic solution to spread the chromosomes, fixation of chromosomes on a glass slide for examination under a light microscope, and subsequent counting and grouping of chromosomes according to their morphological features [14].A revolutionary step in human cytogenetics came with the introduction of different chromosome banding techniques revealing specific chromosomal patterns, including fluorescence-based Quinacrine banding (Q-banding) [15] and Giemsa staining (Gbanding) [16], which have become the most widely used banding methods.Following the advent of cytogenetic and banding techniques, discoveries were quickly made with regard to CNVs associated with human pathologies.However, karyotyping techniques available in the 1960s only allowed detection of gross numerical and morphological abnormalities, because the resolution of light microscopes was limited to imbalances larger than 5 Mb.
Conventional cytogenetic techniques combined with molecular techniques such as hybridisation led to the emergence of molecular cytogenetics, the main methods being fluorescence in situ hybridization (FISH) [17] and comparative genomic hybridization (CGH), both still requiring fluorescent microscopy [18].FISH is based on hybridisation of sequence specific fluorescently labelled probes with subsequent microscopic detection of a given fluorescent signal that indicates the presence or absence of specific target DNA sequences [17].This technique has undergone several modifications, from single-eventspecific tests up to chromosome painting, making it possible to detect individual loci as small as 10 kb [19].Certain limitations of conventional cytogenetics and FISH, mainly those of resolution, led to the development of CGH for CNV detection [18].By the comparison of fluorescent signals generated from DNA of the tested and control samples, along the chromosomes to which they were hybridised, CGH is capable of identifying increased or decreased copy numbers of sequences at least ~3-10 Mb in size [20].

Methods of Molecular Biology
Molecular-biology methods such as Southern blot hybridization offer higher resolution than cytogenetics [21].The principle of Southern blotting relies upon fragmentation of DNA with a restriction endonuclease and separating fragments by gel electrophoresis.The fragments are transferred to a membrane and hybridized to appropriate probes.Copy number changes are visible as differential hybridization intensities or as altered mobility of the fragments.Although for many years, Southern blotting was the standard method for the detection of deletions or amplifications in the range of 5-500 kb [22], it is a laborious, time-consuming method that requires large amounts of high-quality DNA [23].
A tremendous improvement in the screening of CNVs came with the introduction of microarray-based methods, specifically in conjunction with comparative genomic hybridization, where DNA samples extracted from the tested and reference cells are cohybridized to an array of fixed oligonucleotide probes instead of metaphase chromosomes [24].The aCGH provides genome-wide coverage at a much higher resolution of 10-25 kb [25] or even >500 bp if high-density arrays are used [26].Despite some limitations in resolution and accuracy, this made aCGH a standard in CNV detection [27].
PCR-based methods may either come in the form of conventional two-or three-primer based protocols or multiplexed assays.CNVs may be detected by PCR-based protocols through the change of: (i) migrational properties during agarose gel or other types of electrophoresis [28]; (ii) amplification cycles required to achieve a relative threshold fluorescent intensity in real-time quantitative PCR (qPCR) assays [29,30]; (iii) relative fluorescent signal intensities in capillary electrophoresis when using quantitative-fluorescent PCR (QF-PCR) [31] or multiplex ligation-dependent probe amplification (MLPA) techniques [32]; (iv) denaturation properties reflected in melting temperatures or melting curve shapes during conventional or high-resolution melting analysis [33].All of these methods are more or less convenient for the targeted detection of a limited number of CNVs in a relatively wide range of length from tens of bp up to Mb and even whole chromosomes at low cost and fast turnaround time [34].However, each of these methods also has its own advantages and limitations.A more recently introduced alternative to the traditional qPCR in CNV detection is the droplet digital PCR (ddPCR).In this method, template DNA is diluted and partitioned into thousands of nano-scale droplets of uniform volume, allowing for absolute quantification of target copy numbers without the need for a standard assay, making results easier to interpret and less error-prone than regular qPCR [35].Another PCR-based method is multiplex amplifiable probe hybridization (MAPH) using oligonucleotide probes that hybridize to a specific region in the genome.Hybridized probes are amplified and the amount of each amplification product is proportional to the copy number of the corresponding sequence [36].MAPH enables the sensitive detection of CNVs as small as 150 bp [37].Even more sensitive and easier to use is MLPA developed to determine the copy number of multiple genomic DNA sequences (up to 60 probes) in a single reaction with resolution from a single nucleotide difference.The probes hybridized to the sample DNA are ligated and amplified, resulting in fragments of a unique length which can be separated and quantified by capillary electrophoresis [32].Therefore, MLPA is a cost-effective method that can be performed with equipment present in most molecular biology laboratories.
Although the first generation of DNA sequencing (1GS) technologies, specifically Sanger sequencing, was generally considered to be the gold standard in DNA diagnostics, CNVs represent specific challenges not easily dealt with when using this method.Their detectability strongly depends on the length and type of the CNV, as well as on its position with regard to the used amplification primers (for more details on the possible effect of CNVs on reliability, see the next subheading).Since 2005, when the first platforms of second-generation sequencing (2GS) technology became available [38], methods based on MPS have undergone several modifications, their cost ever on the decrease [39].In current times, 2GS represents a valuable tool for clinical diagnostics and provides a sensitive and accurate approach for the detection of the major types of genomic variations, including CNVs [40].There are three main strategies for 2GS-based CNV analysis, namely wholegenome, whole-exome, and targeted sequencing.Due to the limited length of DNA fragments sequenced by 2GS sequencers, variation is detected by abnormalities in the affected areas using robust statistical and bioinformatic processing [41].Read-depth methods highlight regions with an irregular number of sequenced fragments: a loss is seen as a lower, and a gain as a higher than expected amount of a particular segment.Read-pair and split-read approaches analyze fragments with discordant alignments of sequenced fragments, where portions of a single fragment are aligned to unexpected sites in the reference genome.While the above methods directly analyze reads mapped to the reference genome, assembly-based methods compare longer sections of an individual's genome, called contigs.This approach may reveal more complex genome rearrangements, but genome assembly is computationally more intensive and requires substantially higher capacity.Whole-genome sequencing combined with sophisticated computational strategies improved CNV detection, allowing even base-pair resolution of breakpoints [42].On the other hand, whole-exome sequencing targets only the protein-coding part of the genome.However, since most of the known disease-causing mutations fall into this category, exome sequencing significantly reduces sequencing cost in medical applications and is still sufficiently powerful.Moreover, targeted sequencing provides a greater depth of coverage in regions of interest for an even lower cost [43].
Third-generation sequencing (3GS) technologies (e.g., single-molecule real-time sequencing [44] and nanopore sequencing [45]) bring promise for better characterization of genomic structural variants due to longer reads [46] that can be more confidently aligned to repetitive sequences, often mediating the formation of structural variants [47].While both microarray and 2GS techniques are based on complex laboratory procedures which require several days to obtain results [48], nanopore-based 3GS provides pocket-sized, low-cost devices that usually take from 24 to 48 h to run, with reads generated continuously, so data can be used for processing and further analysis in real-time during the ongoing sequencing process [45].Moreover, the method can be combined with a rapid library preparation kit capable of obtaining ready to sequence genomic DNA in 10 min.Data generated in the first tens of minutes of a run are sufficient to detect large chromosomal alterations with a resolution in the order of tens of Mb.Data produced in the first 6-12 h of a sequencing run can be used to identify CNVs with an accuracy comparable to currently available array-based methods, and are capable of predicting the allelic fraction of genomic alterations with high accuracy [42].The problem with CNV breakpoint identification often encountered in PCR-or array-based methods can also be solved by breakpoint sequencing [49].Using 3GS devices, it will soon be possible to perform a cost-effective high-resolution molecular karyotyping of the human genome within an hour from sample extraction, allowing ultra-fast analyses in fields where time matters, such as precision oncology and prenatal diagnostics [42].
When considering in silico tools to extract CNV genotype information from generated data, nearly all of the available methods have their dedicated commercial tools, from cytogenetic karyotyping, through MLPA, up to aCGH.The bioinformatic tools for processing MPS data are, however, still under intensive development and diversification (Table 1).While both 2GS and 3GS are technically capable of detecting CNVs in a wide range of length, not each size is identifiable using the same bioinformatics pipeline and different variants may require differently suited tools [50].To identify smaller structural variations spanning several nucleotides, conventional variant callers, such as the GATK Haplotype-Caller [51], are generally suitable, while large CNVs exceeding read lengths are typically identified based on a disproportion of sequenced reads from the genomic region of a particular CNV [52].In conclusion, since each CNV detecting method has its advantages and limitations, the choice for an appropriate technique depends on the application, required resolution, available lab equipment, workload, and budget.

Mac OS X Linux
Free for noncommercial use [53,54] ExomeCNV ExomeCNV is based on an algorithm using statistics of sequence coverage and B-allele frequencies for CNV and loss of heterozygosity estimation by mapping short sequence reads.ExomeCNV was the first tool implemented to detect CNVs from WES data.

MS Windows Mac OS X Linux
Free-software license [55] SAvvyCNV A tool that uses off-target or non-target reads data from targeted panel and exome sequencing to call CNVs genome-wide.SavvyCNV may call CNVs with high precision and recall.

MS Windows Mac OS X Linux
Free-software license [56] CopywriteR A tool that can generate high-quality DNA copy number profiles using off-target reads from targeted sequencing data.In addition, CopywriteR allows extracting accurate copy number information without a reference.

MS Windows Mac OS X Linux
Free-software license [57] DECoN A fast and accurate tool for exon CNV detection from whole exons in targeted panel analysis, capable of detecting small intra-exon variants.It provides quality checks and visualization to make it suitable for clinical use.

MS Windows Mac OS X Linux
Freely available [58] CNVkit A software toolkit for detection, analysis, and visualization of CNVs, able to estimate CNVs and alterations genome-wide from high-throughput sequencing data.It implements a pipeline for CNV detection that takes advantage of both on-and off-target reads and applies a series of corrections to improve copy number calling accuracy.

Mac OS X Linux
Free software licence [59] Canvas SPW Canvas SPW (Small Pedigree Workflow) is a tool for CNV calling that serves to identify germline and de novo CNVs from pedigree sequencing data.In addition, it infers genome-wide parameters such as cancer ploidy, purity and heterogeneity.

MS Windows Linux
Free-software license [60] MFCNV A computational method that (i) considers the intrinsic correlations among adjacent positions in the genome, (ii) calculates read depth, GC-content bias, base quality, and correlation value for each genome bin, and (iii) trains a neural network algorithm to predict CNVs.
NA Free-software license [61] VarScan 2 Analysis tool for the detection of somatic mutations and CNVs in exome data from tumor-normal pairs.The algorithm reads data from both samples simultaneously; a heuristic and statistical algorithm detects sequence variants and classifies them by somatic status (germline, somatic, or LOH); while a comparison of normalized read depth delineates relative copy number changes.

MS Windows Mac OS X Linux UNIX
Free for noncommercial use [62] ADTEx ADTEx (Aberration Detection in Tumour Exome) is a method to infer somatic CNVs and genotypes using WES data from paired tumour/normal samples.The algorithm uses hidden Markov models to predict CNV counts, genotypes, polyploidy, aneuploidy, cell contamination, and baseline shifts.
Linux Free-software license [63] ReadDepth An R package for inferring CNVs from short-read sequencing data.The algorithm uses a statistical model that accounts for overdispersed data and does not require reference sample data.It includes a method for increasing the resolution from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement.For calling somatic CNVs from matched tumor/normal pairs, the authors of ReadDepth recommend a copyCat package that is loosely based on readDepth.

MS Windows Mac OS X Linux
Free software licence [64,65] CONDEL CONDEL (CONsensus DELeteriousness) is a method for detecting CNVs from single tumor samples using high-throughput sequence data.It utilizes a novel statistic in combination with a peel-off scheme to assess the statistical significance of genome bins, and adopts a Bayesian approach to infer copy number gains, losses, and deletion zygosity based on statistical mixture models.

MS Windows Mac OS X Linux
Freely available [66] CNV_IFTV A method that uses a novel isolation forest algorithm and variation-based detection of CNVs from short-read sequencing data.It is a reliable tool even for low-level coverage and tumor purity.

MS Windows Mac OS X Linux
Freely available [67] Control-FREEC A tool for detection of copy-number changes and allelic imbalances (including LOH) using deep-sequencing data.Control-FREEC automatically computes, normalizes, and segments copy number and beta allele frequency profiles, then calls CNVs and LOH.The control sample is optional for WGS data but mandatory for WES or targeted sequencing data.

MS Windows Linux
Free software licence [68] EXCAVATOR EXCAVATOR2 EXCAVATOR (EXome Copy number Alterations/Variations annotATOR) a tool for the detection of CNVs from WES data combines a three-step normalization procedure with a hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states.
EXCAVATOR2 is an enhanced version of EXCAVATOR.It is a read count based tool that exploits all the reads produced by WES experiments to detect CNVs with a genome-wide resolution.

Mac OS X Linux
Freely available [69,70] XCAVATOR A software package for the identification of genomic regions involved in CNVs from short and long reads in whole-genome sequencing experiments.

Mac Linux
Free-software license [71] NA, not available.If applicable, operating systems for individual tools were collected from https://bioinformaticshome.com.If applicable, availability/license information were collected from https://github.comor from the home page of individual tools.

Techniques Possibly Affected by the Presence of Undetected CNVs
In addition to detection possibilities, another aspect worth discussing is that certain methods are at risk of giving inferior results due to the presence of undetected CNVs.Such methods include Southern blotting and PCR as well as both 1GS and 2GS.PCR and PCR-based sequencing methods are prone to allelic dropout caused by the presence of deletions in the analyzed region, especially if affecting one of the primer binding regions, or may falsely show hemizygous instead of homozygous alleles if the entire amplified region is deleted.They may also be affected by the presence of false-positive variants, such as unknown homologous copies of the analyzed region (e.g., pseudogenes or pseudoexons) with high but not full sequence homology, like in the case of the CFTR pseudoexon 2 present in the GRCh38, but not in earlier versions of the human reference genome [50].
Some of these effects may be prevented, eliminated, or at least attenuated in some ways, depending on whether the presence of a certain CNV is expected or unforeseen.These methods include but are not limited to: (i) checking the region of interest for specific CNVs by an alternative technique (e.g., sequencing of single genes may be complemented by MLPA, while sequencing of whole exomes and genomes may incorporate a CNV-specific bioinformatic variant calling pipeline to complement conventional variant calling of small variants); (ii) using two or more complementing assays based on different principles and being liable to different biases; (iii) careful evaluation and reporting of results by well trained users who are familiar with the used technique, including thorough quality control and reporting only unambiguous findings truly supported by the results ( for example, not reporting variants as homozygous, when detected using sequencing with PCR preamplification, unless other heterozygous variants in the same amplicon were not detected, or until the possible presence of CNVs is checked); or (iv) at least by disclosing the possible biases in the results.

Potential Biomedical Applications of CNV Detection
CNVs can be analyzed from different biological sources, offering various valuable information, so there are plenty of biomedical applications where CNV detection may be useful.CNVs have been studied in neuropsychiatric [72,73], developmental [74], and cardiovascular diseases [75].Several studies have identified the role of CNVs in common diseases such as coronary artery disease or in rarer events such as sudden cardiac death.Such findings may be useful for clinicians for disease classification and detection in the future, particularly in the age of the whole genome sequencing [76].
On the other hand, CNVs have been identified as susceptibility factors for autoimmune diseases such as systemic lupus erythematosus (SLE).The human C4 gene is one of the most striking examples of genetic diversity, due to a great variation in number and size of gene copies between individuals.Low copy numbers of the C4 and C4A gene are significant risk factors for the development of SLE in different populations.Meta-analysis by Li et al. showed that <4 copies of the C4 gene increase susceptibility to autoimmune diseases with an odds ratio of 1.46 (95% CI, 1.19-1.78)[77].In addition, C4A has been associated with disease severity.Thus, determination of C4 gene copy numbers may be useful in sub-phenotyping and managing SLE patients [78].
CNVs obtained from blood cells or tissues are suitable for the identification of germline or somatic variants.Tissue biopsy is a well-established procedure in cancer diagnosis for identification of human genomic alterations.However, this technique is invasive, time-consuming, not sufficient to examine the entire tumor profile, and not applicable in the follow-up of cancer treatment [79].The current trend is moving towards non-or less-invasive sampling, such as liquid biopsy [80].In combination with whole-genome copy number analysis, which does not require any prior knowledge about the characteristics of the primary tumor genome, it represents a promising clinical tool.Heitzer et al. reviewed approaches for analyses of somatic copy number alterations at a genome-wide scale [81].Both circulating tumor cells (CTCs) and cell-free DNA (cfDNA) were shown to be powerful sources in CNV profiling.
Ni et al. hypothesize that copy number changes are key events of metastasis.They observed cancer-associated CNVs in exomes of CTCs revealing information needed for individualized therapy, such as drug resistance and phenotypic transition and suggest that CNVs at certain genomic loci have the potential for CTC-based cancer diagnostics [82].Several studies demonstrated that the detection of ALK gene rearrangement in nonsmall-cell lung cancer (NSCLC), a predictive biomarker for crizotinib treatment, may be performed using CTCs.The same group also reported that CTCs can be used for sensitive detection of ROS1 rearrangement in NSCLC patients.CTCs from ROS1-rearranged patients show heterogeneity of ROS1 gene abnormalities and elevated numerical chromosomal instability, suggesting a potential mechanism for resistance to crizotinib, a known ROS1-inhibitor [83].
Since tumor cells frequently undergo necrosis, they release tumor-specific cfDNA (ctDNA) into body fluids such as blood, urine, saliva, etc. [84].It was shown that quantification of tumor-specific rearrangements in ctDNA by ddPCR is highly accurate for postsurgical discrimination between patients with an eventual diagnosis of clinical metastasis and long-term disease-free patients, with a sensitivity of 93% (95% CI, 66-100%) and specificity of 100% (95% CI, 61-100%).Moreover, ctDNA-based detection preceded clinical detection of metastasis in 86% of patients with an average lead time of 11 months, whereas patients with long-term disease-free survival had undetectable ctDNA postoperatively [85].Peng et al. presented a method enabling CNV detection from a 150-gene panel using a low amount of ctDNA.They demonstrated that their CNV pipeline can detect EGFR, ERBB2, and MET amplification from ctDNA samples with high specificity and concordance with corresponding tissue-based whole-exome results.The concordance rate for EGFR, ERBB2, and MET CNVs was 78%, 89.6%, and 92.4%, respectively [86].The analysis of circulating nucleic acids may also be helpful in other diseases.Since cfDNA biomarkers are known to be important in many autoimmune and multifactorial diseases such as IBD [87], cfDNA could also be used for studying CNVs in such disorders.
CNVs are also useful in the diagnostics of rare and common diseases or predispositions.This may be performed as prenatal testing through direct testing of the fetus or indirectly using maternal blood.Detection of CNVs is a common part of modern noninvasive prenatal testing (NIPT), most commonly based on low-coverage whole-genome sequencing analysis of cell-free fetal DNA (cffDNA) from maternal plasma [88].This approach is useful for the detection of chromosomal aneuploidy and microdeletion syndromes, including DiGeorge, Prader-Willi/Angelman, 1p36, Cri-du-chat, and Wolf-Hirschhorn syndrome [52].Apart from fetal CNVs, maternal ones can also be detected by this method, although current analyses generally do not interpret these findings.Maternal aberrations are potentially harmful to the fetus, so some authors suggest reporting these variants if clinically relevant.On the other hand, performing NIPT may lead to the incidental diagnosis of maternal diseases, such as previously unrecognized pathologies, lateonset diseases and predispositions arising from maternal germline CNVs, or malignancies and systemic autoimmune diseases presenting with somatic CNVs.Thus, these aspects of CNV detection also affect conventional perception of incidental and secondary findings arising via genetic testing, which are now extensively discussed [89].Giles et al. reported that 80% of genetic counselors recognized it would be beneficial to use NIPT for neoplasm screening, yet more than 90% affirmed that guidelines are necessary to prepare for such situations [90].
CNV detection may also find application in the evaluation of the microbiome balance, through the analysis of CNVs in metagenomes in different body parts.The human microbiome interacts with the host and plays an important role in many host biological processes [91].Host genomic variations influence the composition of the microbiome, which in turn affects the health of the individual.While numerous studies have been focused on associations between the gut microbiome and specific alleles of the host genome, gene copy number also varies.It was shown, for instance, that duplication of the human AMY1 gene is associated with an increased number of oral Porphyromonas in saliva, which is linked to periodontitis.Gut microbiota of these individuals had increased abundance of resistant starch-degrading microbes, produced higher levels of short-chain fatty acids, and drove higher adiposity when transferred to germ-free mice [92].This case demonstrated that even seemingly harmless variants in the host genome could affect the health of an individual.
Current knowledge suggests that it is important to analyze CNVs not only in human cells, but also in the microbiome.Taxonomic characterization of the human microbiota is often limited to the species level, however, each microbial species represents a large collection of strains that may contain considerably different sets or copy numbers of genes resulting in potentially distinct functional capacities.This intra-species variation is caused by deletion and duplication events, which were shown to be prevalent in the human gut environment, with some species exhibiting CNVs in >20% of their genes.This variability is especially prevalent in disease-associated genes involved in important functions, such as transport and signaling.A study by Greenblum et al. showed obesity to be associated with higher copy numbers of thioredoxin 1 in Clostridium sp., an increased copy number of an MFS transporter gene in the Roseburia inulinivorans genome cluster, and increased HlyD in Bacteroides uniformis associated with IBD-afflicted individuals.According to the authors, the analysis of species composition alone is sufficient to capture the true functional potential of the microbiome because it may fail to capture important functional differences, so the analysis of intra-species variation in microbial communities is crucial [93].

Clinical Interpretation of CNVs
As detailed above, CNVs are an important source of normal and pathogenic variation.Pathogenic CNVs are typically large and contain multiple genes, significantly enriched in developmental genes and genes with greater evolutionary copy number conservation across mammals.On the other hand, genes found in benign CNVs have more variable copy numbers, suggesting that dosage sensitivity of genes is a predominant causative factor for CNV pathogenicity [94].In everyday practice, laboratory diagnosticians, genetic counselors, and clinical geneticists need to distinguish pathogenic CNVs from benign ones in their patients, and such interpretation can be challenging.Many recurring CNVs are already classified into one of the five main classes of clinical impact (benign, likely benign, VUS, likely pathogenic, and pathogenic), a uniformized system commonly used for the interpretation of other sequence variants as well [95].However, progress in the detection of CNVs resulted in a growing amount of novel CNVs that need further analysis to determine their potential clinical impact, while between the two clear extremes (benign and pathogenic), a wide spectrum of CNVs lacking evidence to support their clinical significance are classified as VUS [4].This led to a demand for a more convenient annotation and classification of such CNVs.Even though the prediction of the clinical impact of CNVs is a challenge, there are several in silico prediction or decision support tools (Table 2) for CNV classification to help laboratory diagnosticians, genetic counselors and clinicians [96].

Tool Description Operating System Availability Reference
AnnotSV A standalone program designed for annotating and ranking SVs.The tool compiles functionally, regulatory and clinically relevant information and aims at providing annotations useful to (i) interpret the potential pathogenicity of SVs and (ii) filter out potential false positives.

MS Windows Mac OS X Linux
Free-software license [97] iCopyDAV Integrated platform for CNV detection, annotation and visualization enabling the user to identify CNVs in whole-genome NGS data.iCopyDAV consists of seven modules for (i) calculating optimal bin size; (ii) data preparation; (iii) data pre-treatment; (iv) segmentation; (v) variant calling; (vi) CNV annotation; (vii) plotting CNVs across the chromosome.

Mac OS X Linux
Freely available [98] AluScanCNV2 An R package for CNV calling and machine learning-based cancer risk prediction with NGS data.It uses Geary-Hinkley transformation-based comparison of the read-depth.

MS Windows Mac OS X Linux
Free-software license [99] CNVAnnotator A web service that displays genomic overlaps of the input coordinates with built-in databases of CNVs and SNPs from genome-wide association studies

MS Windows Mac OS X Linux
Free access [100] and additional features such as ENCODE regulatory elements, cytobands, segmental duplications, genome fragile sites, pseudogenes, promoters, enhancers, CpG islands, and methylation sites.
Results are free to academic research.Not for profit cnvScan A CNV screening and annotation tool to improve the clinical utility of computational CNV prediction from exome sequencing data.The screening step evaluates CNV prediction using quality scores and refines it using an in-house CNV database.The annotation step uses multiple external databases from three groups: gene and functional effect datasets, known CNVs from public databases and clinically significant datasets.

SG-ADVISER-CNV
A suite (consisting of an annotation pipeline and a Webserver) for CNV detection and interpretation by performing in-depth annotations and functional predictions for CNVs.The tool is designed to allow users with no prior bioinformatics expertise to handle large volumes of CNV data.

MS Windows
Mac OS X Linux NA [105] ClinTAD A browser-based tool for quick evaluation of the clinical relevance of a CNV in the context of TADs.It allows to input a chromosome number, genomic coordinates, and phenotypic information and relate this data to nearby TAD boundaries and genes.

MS Windows Mac OS X Linux
Freely available [106] CNVScope A tool for CNV relationship data analysis and visualization, allowing users to create interaction maps, discover CNV map domains, annotate gene interactions, and create interactive visualizations of these CNV interaction maps.

MS Windows Mac OS X Linux
Free-software license [107] DeAnnCNV A tool for online detection and annotation of CNVs from WES data.It can extract the shared CNVs among multiple samples and also provides supporting information for the detected CNVs and associated genes.

MS Windows Mac OS X Linux
Freely available [108] ClassifyCNV An easy-to-use tool that implements the 2019 ACMG classification guidelines to assess CNV pathogenicity.It uses genomic coordinates and CNV type as input and reports a clinical classification for each variant, a classification score breakdown, and a list of genes of potential importance for variant interpretation.

Mac OS X Linux UNIX
Free for academic and research use only [109] NA, not available.If applicable, operating systems for individual tools were collected from https://bioinformaticshome.com.If applicable, availability/license information were collected from https://github.comor from the home page of individual tools.
It is essential to produce consistent, evidence-based clinical classification across laboratories and accurate clinical interpretation of CNVs, which requires not only appropriate methods to evaluate genomic content but also correlating clinical findings with reports in the medical literature.To ensure this, existing standards for evaluating CNVs were recently updated, and detailed recommendations for the interpretation and reporting of constitutional CNVs were published [10].These recommendations comprise a semiquantitative point-based scoring system in which evidence categories with assigned relative weight were determined.When evaluating individual CNVs, genomic content, dosage sensitivity, predicted functional effect, clinical overlap with patients in the literature, evidence from case and control databases (Table 3), and de novo occurrence or inheritance patterns are considered [10].Using this scoring system, any evaluated CNV should be assigned to one of the five above mentioned main classes of clinical impact [95].It was also demonstrated that topologically associated domains, in which structural alteration results in various malformations, may increase clinical suspicion of pathogenicity for variants of uncertain significance.This piece of information, among others, may help in the clinical interpretation of CNVs that would otherwise be ignored based on current reporting criteria [106].So, appropriate clinical interpretation relies on supporting evidence and, therefore, is still challenging.An effective way of overcoming the problem of VUS and achieving progress in clinical interpretation that may eventually translate to an improvement in patient health care is to share data and relevant information between laboratories and researchers [110].

Conclusions
In this work, we provide an overview of CNV detection methods, from basic cytogenetic methods to molecular-based approaches such as aCGH or MPS.Detecting CNVs in individuals and within populations is essential to better understand our genome and to elucidate its possible contribution to disease or phenotype.The growing availability of sequencing technology can help to further explore these functional implications, but since it can yield up to several terabytes of genomic data per run, it is not possible to unlock the full potential of such data without the help of CNV-related bioinformatic tools.Despite all the improvements in methodology and software, clinical interpretation of CNVs still remains a major challenge.Moreover, due to improving resolution, the number of novel structural variants is constantly increasing and this led to a demand for more convenient tools designed for storing, searching, annotating and evaluating CNV-related data to increase practical value for researchers, laboratory diagnosticians and clinical geneticists facing the challenging task of correctly interpreting the clinical impact of CNVs.

Figure 1 .
Figure 1.Hallmarks in copy number variant (CNV) history.The 20th century saw a steady development of methods, which finally allowed genome-wide, high-resolution CNV detection around the beginning of the 21st century.

Table 1 .
Bioinformatic tools for detection of CNVs from next generation sequencing-based genomic data.Several tools are capable of detection and annotation of CNVs at the same time (e.g., iCopyDAV, SG-ADVISER-CNV, DeAnnCNV), so they are listed in the next table.WES (whole-exome sequencing); WGS (whole-genome sequencing).
tool that predicts the impact of SVs based on SNP pathogenicity scores across relevant genomic intervals for each SV.The tool assigns a very simple aggregate pathogenicity score to an SV based on overlapping SNP pathogenicity scores.Multiple options for aggregation are supported: maximum, sum, mean and mean of the top N scores.

Table 3 .
Databases of common and clinically relevant genomic CNVs.The most popular databases that play a crucial role in variant classification are listed here.An open resource of structural variation for medical and population genetics.The gnomAD structural variant (SV) callset is available via the gnomAD website and integrated directly into the gnomAD Browser.The world's largest source of expert manually curated somatic mutation information relating to human cancers.The database combines two main types of data: manually curated high precision data and genomewide screen data, which provide extensive coverage of the cancer genomic landscape from a somatic perspective.handcurated breakpoints and other genomic features related to autism, taken from publicly available literature, databases and unpublished data.The database is continuously updated with information from in-house experimental data as well as data from published research studies.The information in the database relates cytogenetic changes and their genomic consequences, in particular gene fusions, to tumor characteristics, based either on individual cases or associations.