Improving CNV Detection Performance Except for Software-Specific Problematic Regions
Abstract
1. Introduction
2. Materials and Methods
2.1. Sample Information
2.2. Identification of Copy Number Variations (CNVs) from Chromosomal Microarray Analysis (CMA)
2.3. CNV Identification from WES Data
2.4. Performance Evaluation
2.5. Defining Problematic Regions
2.6. Statistical Analysis
3. Results
3.1. Reference CNV Set Construction and Detection Program Evaluation
3.2. Problematic Region in CNV Detection
3.3. Performance Improvement
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| CI | confidence intervals |
| CMA | chromosomal microarray analysis |
| CNV | copy number variation |
| CV | coefficient of variation |
| MLPA | multiplex ligation-dependent probe amplification |
| PPV | Positive predictive value |
| RPKM | reads per kilobase million |
| WES | Whole exome sequencing |
| WGS | whole-genome sequencing |
References
- Sebat, J.; Lakshmi, B.; Troge, J.; Alexander, J.; Young, J.; Lundin, P.; Maner, S.; Massa, H.; Walker, M.; Chi, M.; et al. Large-scale copy number polymorphism in the human genome. Science 2004, 305, 525–528. [Google Scholar] [CrossRef]
- Zhang, F.; Gu, W.; Hurles, M.E.; Lupski, J.R. Copy number variation in human health, disease, and evolution. Annu. Rev. Genom. Hum. Genet. 2009, 10, 451–481. [Google Scholar] [CrossRef]
- McCarroll, S.A.; Altshuler, D.M. Copy-number variation and association studies of human disease. Nat. Genet. 2007, 39, S37–S42. [Google Scholar] [CrossRef]
- Cooper, G.M.; Coe, B.P.; Girirajan, S.; Rosenfeld, J.A.; Vu, T.H.; Baker, C.; Williams, C.; Stalker, H.; Hamid, R.; Hannig, V.; et al. A copy number variation morbidity map of developmental delay. Nat. Genet. 2011, 43, 838–846. [Google Scholar] [CrossRef]
- Miller, D.T.; Adam, M.P.; Aradhya, S.; Biesecker, L.G.; Brothman, A.R.; Carter, N.P.; Church, D.M.; Crolla, J.A.; Eichler, E.E.; Epstein, C.J.; et al. Consensus statement: Chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 2010, 86, 749–764. [Google Scholar] [CrossRef]
- Haraksingh, R.R.; Abyzov, A.; Urban, A.E. Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans. BMC Genom. 2017, 18, 321. [Google Scholar] [CrossRef]
- Capkova, P.; Srovnal, J.; Capkova, Z.; Staffova, K.; Becvarova, V.; Trkova, M.; Adamova, K.; Santava, A.; Curtisova, V.; Hajduch, M.; et al. MLPA is a practical and complementary alternative to CMA for diagnostic testing in patients with autism spectrum disorders and identifying new candidate CNVs associated with autism. PeerJ 2019, 6, e6183. [Google Scholar] [CrossRef]
- Gross, A.M.; Ajay, S.S.; Rajan, V.; Brown, C.; Bluske, K.; Burns, N.J.; Chawla, A.; Coffey, A.J.; Malhotra, A.; Scocchia, A.; et al. Copy-number variants in clinical genome sequencing: Deployment and interpretation for rare and undiagnosed disease. Genet. Med. 2019, 21, 1121–1130. [Google Scholar] [CrossRef]
- Kosugi, S.; Momozawa, Y.; Liu, X.; Terao, C.; Kubo, M.; Kamatani, Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 2019, 20, 117. [Google Scholar] [CrossRef]
- Gabrielaite, M.; Torp, M.H.; Rasmussen, M.S.; Andreu-Sánchez, S.; Vieira, F.G.; Pedersen, C.B.; Kinalis, S.; Madsen, M.B.; Kodama, M.; Demircan, G.S.; et al. A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data. Cancers 2021, 13, 6283. [Google Scholar] [CrossRef]
- Gordeeva, V.; Sharova, E.; Babalyan, K.; Sultanov, R.; Govorun, V.M.; Arapidi, G. Benchmarking germline CNV calling tools from exome sequencing data. Sci. Rep. 2021, 11, 14416. [Google Scholar] [CrossRef]
- Zhao, L.; Liu, H.; Yuan, X.; Gao, K.; Duan, J. Comparative study of whole exome sequencing-based copy number variation detection tools. BMC Bioinform. 2020, 21, 97. [Google Scholar] [CrossRef]
- Moreno-Cabrera, J.M.; Del Valle, J.; Castellanos, E.; Feliubadalo, L.; Pineda, M.; Brunet, J.; Serra, E.; Capella, G.; Lazaro, C.; Gel, B. Evaluation of CNV detection tools for NGS panel data in genetic diagnostics. Eur. J. Hum. Genet. 2020, 28, 1645–1655. [Google Scholar] [CrossRef]
- Tan, R.; Wang, Y.; Kleinstein, S.E.; Liu, Y.; Zhu, X.; Guo, H.; Jiang, Q.; Allen, A.S.; Zhu, M. An evaluation of copy number variation detection tools from whole-exome sequencing data. Hum. Mutat. 2014, 35, 899–907. [Google Scholar] [CrossRef]
- Kim, H.Y.; Choi, J.W.; Lee, J.Y.; Kong, G. Gene-based comparative analysis of tools for estimating copy number alterations using whole-exome sequencing data. Oncotarget 2017, 8, 27277–27285. [Google Scholar] [CrossRef]
- Zare, F.; Dow, M.; Monteleone, N.; Hosny, A.; Nabavi, S. An evaluation of copy number variation detection tools for cancer using whole exome sequencing data. BMC Bioinform. 2017, 18, 286. [Google Scholar] [CrossRef]
- Rajagopalan, R.; Murrell, J.R.; Luo, M.; Conlin, L.K. A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data. Genome Med. 2020, 12, 14. [Google Scholar] [CrossRef]
- Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef]
- Broad Institute. Picard Toolkit. Broad Institute, GitHub Repository. 2019. Available online: http://broadinstitute.github.io/picard (accessed on 30 December 2025).
- Klambauer, G.; Schwarzbauer, K.; Mayr, A.; Clevert, D.-A.; Mitterecker, A.; Bodenhofer, U.; Hochreiter, S. cn.MOPS: Mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res. 2012, 40, e69. [Google Scholar] [CrossRef]
- Krumm, N.; Sudmant, P.H.; Ko, A.; O’Roak, B.J.; Malig, M.; Coe, B.P.; Project, N.E.S.; Quinlan, A.R.; Nickerson, D.A.; Eichler, E.E. Copy number variation detection and genotyping from exome sequence data. Genome Res. 2012, 22, 1525–1532. [Google Scholar] [CrossRef]
- Plagnol, V.; Curtis, J.; Epstein, M.; Mok, K.Y.; Stebbings, E.; Grigoriadou, S.; Wood, N.W.; Hambleton, S.; Burns, S.O.; Thrasher, A.J.; et al. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics 2012, 28, 2747–2754. [Google Scholar] [CrossRef]
- Talevich, E.; Shain, A.H.; Botton, T.; Bastian, B.C. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput. Biol. 2016, 12, e1004873. [Google Scholar] [CrossRef]
- Derrien, T.; Estelle, J.; Marco Sola, S.; Knowles, D.G.; Raineri, E.; Guigo, R.; Ribeca, P. Fast computation and applications of genome mappability. PLoS ONE 2012, 7, e30377. [Google Scholar] [CrossRef]
- Quinlan, A.R.; Hall, I.M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 2010, 26, 841–842. [Google Scholar] [CrossRef]
- Tilemis, F.N.; Marinakis, N.M.; Veltra, D.; Svingou, M.; Kekou, K.; Mitrakos, A.; Tzetis, M.; Kosma, K.; Makrythanasis, P.; Traeger-Synodinos, J.; et al. Germline CNV Detection through Whole-Exome Sequencing (WES) Data Analysis Enhances Resolution of Rare Genetic Diseases. Genes 2023, 14, 1490. [Google Scholar] [CrossRef]
- De La Vega, F.M.; Irvine, S.A.; Anur, P.; Potts, K.; Kraft, L.; Torres, R.; Kang, P.; Truong, S.; Lee, Y.; Han, S.; et al. Benchmarking of germline copy number variant callers from whole genome sequencing data for clinical applications. Bioinform. Adv. 2025, 5, vbaf071. [Google Scholar] [CrossRef]
- Wang, X.; Chang, Z.; Liu, Y.; Wang, S.; Zhu, X.; Shao, Y.; Wang, J. EMcnv: Enhancing CNV detection performance through ensemble strategies with heterogeneous meta-graph neural networks. Brief. Bioinform. 2025, 26, bbaf135. [Google Scholar] [CrossRef]
- Munte, E.; Roca, C.; Del Valle, J.; Feliubadalo, L.; Pineda, M.; Gel, B.; Castellanos, E.; Rivera, B.; Cordero, D.; Moreno, V.; et al. Detection of germline CNVs from gene panel data: Benchmarking the state of the art. Brief. Bioinform. 2024, 26, bbae645. [Google Scholar] [CrossRef]
- Babadi, M.; Fu, J.M.; Lee, S.K.; Smirnov, A.N.; Gauthier, L.D.; Walker, M.; Benjamin, D.I.; Zhao, X.; Karczewski, K.J.; Wong, I.; et al. GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data. Nat. Genet. 2023, 55, 1589–1597. [Google Scholar] [CrossRef]
- Demidov, G.; Sturm, M.; Ossowski, S. ClinCNV: Multi-sample germline CNV detection in NGS data. bioRxiv 2022. [Google Scholar] [CrossRef]
- Ellingford, J.M.; Campbell, C.; Barton, S.; Bhaskar, S.; Gupta, S.; Taylor, R.L.; Sergouniotis, P.I.; Horn, B.; Lamb, J.A.; Michaelides, M.; et al. Validation of copy number variation analysis for next-generation sequencing diagnostics. Eur. J. Hum. Genet. 2017, 25, 719–724. [Google Scholar] [CrossRef]
- Amemiya, H.M.; Kundaje, A.; Boyle, A.P. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci. Rep. 2019, 9, 9354. [Google Scholar] [CrossRef]
- Amorim, L.M.; Augusto, D.G.; Nemat-Gorgani, N.; Montero-Martin, G.; Marin, W.M.; Shams, H.; Dandekar, R.; Caillier, S.; Parham, P.; Fernandez-Vina, M.A.; et al. High-Resolution Characterization of KIR Genes in a Large North American Cohort Reveals Novel Details of Structural and Sequence Diversity. Front. Immunol. 2021, 12, 674778. [Google Scholar] [CrossRef]
- Herman, D.S.; Smith, C.; Liu, C.; Vaughn, C.P.; Palaniappan, S.; Pritchard, C.C.; Shirts, B.H. Efficient Detection of Copy Number Mutations in PMS2 Exons with a Close Homolog. J. Mol. Diagn. 2018, 20, 512–521. [Google Scholar] [CrossRef]
- Gould, G.M.; Grauman, P.V.; Theilmann, M.R.; Spurka, L.; Wang, I.E.; Melroy, L.M.; Chin, R.G.; Hite, D.H.; Chu, C.S.; Maguire, J.R.; et al. Detecting clinically actionable variants in the 3′ exons of PMS2 via a reflex workflow based on equivalent hybrid capture of the gene and its pseudogene. BMC Med. Genet. 2018, 19, 176. [Google Scholar] [CrossRef]
- Shekhawat, D.S.; Didel, S.; Dixit, S.G.; Singh, P.; Singh, K. Carrier Screening and Diagnosis for Spinal Muscular Atrophy Using Droplet Digital PCR Versus MLPA: Analytical Validation and Early Test Outcome. Genet. Test. Mol. Biomark. 2024, 28, 207–212. [Google Scholar] [CrossRef]



| High Quality True-Positive CNV | ||
|---|---|---|
| Deletions | Duplications | |
| Number of CNVs | 39 | 59 |
| Size of CNVs | 5.03 kb–4.04 Mb (mean = 292.42 kb) | 53.47 kb–6.91 Mb (mean = 594.91 kb) |
| Number of CNVs per individual | 1–4 (mean = 1.95) | 1–6 (mean = 1.79) |
| N = 39 | ExomeDepth | CNVkit | CoNIFER | cn.MOPS | ||||
|---|---|---|---|---|---|---|---|---|
| Del | Dup | Del | Dup | Del | Dup | Del | Dup | |
| Total call | 5733 | 4722 | 430 | 352 | 198 | 346 | 951 | 488 |
| Size of CNVs | 0.32 kb–3.98 Mb (33.15 kb) | 0.32 kb–5.14 Mb (49.31 kb) | 0.40 kb–3.98 Mb (143.56 kb) | 3.25 kb–6.95 Mb (279.49 kb) | 2.90 kb–4.88 Mb (178.81 kb) | 2.09 kb–21.54 Mb (277.20 kb) | 2.99 kb–3.59 Mb (41.63 kb) | 3.37 kb–2.96 Mb (49.56 kb) |
| Number of CNVs per individual | 85–233 (147.0) | 58–228 (121.08) | 3–23 (11.03) | 1–23 (9.03) | 0–14 (5.08) | 2–25 (8.87) | 18–30 (24.38) | 7–24 (12.51) |
| Size of CNVs | True Positive | False Negative | |
|---|---|---|---|
| −100 kb | Number of CNVs | 31 | 9 |
| Mappability score | 0.41–1 (mean = 0.88, median = 0.98) | 0.44–1 (mean = 0.92, median = 1) | |
| RPKM CV | 0.16–0.57 (mean = 0.23, median = 0.19) | 0.17–0.57 (mean = 0.27, median = 0.23) | |
| Number of WES target baits * | 1–52 (mean = 9.13, median = 5) | 1–3 (mean = 1.4, median = 1) | |
| 100 kb–500 kb | Number of CNVs | 32 | 7 |
| Mappability score | 0.24–1 (mean = 0.77, median = 0.86) | 0.24–0.92 (mean = 0.69, median = 0.82) | |
| RPKM CV * | 0.19–0.46 (mean = 0.26, median = 0.23) | 0.24–0.59 (mean = 0.44, median = 0.47) | |
| Number of WES target baits * | 1–104 (mean = 29.28, median = 21) | 1–72 (mean = 13.57, median = 3) | |
| 500 kb | Number of CNVs | 17 | 2 |
| Mappability score | 0.28–1 (mean = 0.84, median = 0.96) | 0.61–1 (mean = 0.81, median = 0.81) | |
| RPKM CV | 0.14–0.27 (mean = 0.21, median = 0.21) | 0.19–0.31 (mean = 0.25, median = 0.25) | |
| Number of WES target baits | 2–308 (mean = 85.23, median = 66) | 1–10 (mean = 5.5, median = 5.5) |
| ExomeDepth | CNVkit | CoNIFER | cn.MOPS | |||||
|---|---|---|---|---|---|---|---|---|
| Pre-filter | Post-filter | Pre-filter | Post-filter | Pre-filter | Post-filter | Pre-filter | Post-filter | |
| Reference Truth set | 2788 | 2124 | 2788 | 2710 | 2788 | 2713 | 2788 | 2723 |
| TP | 2114 | 1917 | 1847 | 1844 | 2142 | 2093 | 474 | 441 |
| FP | 83,135 | 16,797 | 17,340 | 6578 | 6838 | 6288 | 11,164 | 2628 |
| FN | 674 | 207 | 941 | 866 | 646 | 620 | 2314 | 2282 |
| Sensitivity | 0.758 | 0.903 | 0.662 | 0.680 | 0.768 | 0.771 | 0.170 | 0.162 |
| PPV | 0.025 | 0.102 | 0.096 | 0.219 | 0.239 | 0.250 | 0.041 | 0.144 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Hwang, J.; Byeon, J.H.; Eun, B.-L.; Nam, M.-H.; Cho, Y.; Yun, S.G. Improving CNV Detection Performance Except for Software-Specific Problematic Regions. Genes 2026, 17, 105. https://doi.org/10.3390/genes17010105
Hwang J, Byeon JH, Eun B-L, Nam M-H, Cho Y, Yun SG. Improving CNV Detection Performance Except for Software-Specific Problematic Regions. Genes. 2026; 17(1):105. https://doi.org/10.3390/genes17010105
Chicago/Turabian StyleHwang, Jinha, Jung Hye Byeon, Baik-Lin Eun, Myung-Hyun Nam, Yunjung Cho, and Seung Gyu Yun. 2026. "Improving CNV Detection Performance Except for Software-Specific Problematic Regions" Genes 17, no. 1: 105. https://doi.org/10.3390/genes17010105
APA StyleHwang, J., Byeon, J. H., Eun, B.-L., Nam, M.-H., Cho, Y., & Yun, S. G. (2026). Improving CNV Detection Performance Except for Software-Specific Problematic Regions. Genes, 17(1), 105. https://doi.org/10.3390/genes17010105

