BayesCNV: A Bayesian Hierarchical Model for Sensitive and Specific Copy Number Estimation in Cell Free DNA
Abstract
1. Introduction
2. Related Work
| Tool | PMID Reference | Year | High Spatial Res | Small Sample Support | Quantify CNV | Gain and Losses |
|---|---|---|---|---|---|---|
| Canoes [20] | 24771342 | 2014 | N | N | Y | Y |
| CLAMMS [24] | 26382196 | 2015 | Y | Y | N | Y |
| Cn.MOPS [17] | 22302147 | 2012 | N | N | Y | Y |
| CNVkit [18] | 27100738 | 2016 | N | Y | Y | Y |
| CODEX [25] | 25618849 | 2015 | N | Y | Y | Y |
| CoNIFER [21] | 22585873 | 2012 | N | N | Y | Y |
| DeviCNV [16] | 30326846 | 2018 | Y | Y | Y | Y |
| EXCAVATOR2 [19] | 27507884 | 2016 | N | Y | Y | Y |
| ExomeDepth [26] | 22942019 | 2012 | N | Y | Y | Y |
| ExonDel [27] | 25322818 | 2014 | Y | Y | Y | N |
| FishingCNV [28] | 23539306 | 2013 | N | N | Y | Y |
| HMZDelFinder [29] | 27980096 | 2017 | N | Y | Y | N |
| IonCopy [15] | 26910888 | 2016 | Y | Y | Y | Y |
| XHMM [22] | 23040492 | 2012 | N | N | Y | Y |
3. Materials and Methods
3.1. Overall Processing Steps
3.2. Biological Assumptions
- Linearity of read counts. Expected read counts scale approximately linearly with copy number. For example, a locus present at four copies yields, on average, twice as many reads as a locus with two copies.
- Sample-composition invariance. Observed read depth reflects the total DNA mixture in the sample, and tumor-derived DNA is processed similarly to background DNA within the assay. Consequently, the copy number scales with tumor fraction.
- Stable amplicon-specific effects. While amplification efficiency varies across amplicons, these effects are assumed to be consistent across samples processed under comparable conditions. Therefore, the case and normal samples should be process-matched.
- Sparsity of CNVs. The majority of the amplicons target loci are assumed to be copy-neutral, with CNVs affecting only a minority of targets.
3.3. Mathematical Modeling
3.4. Calling CNVs via Posterior Distribution
- Effect size threshold. The posterior mean should exceed a minimum magnitude, that is , where is the copy ratio we wish to be able to detect
- Confidence threshold. The posterior probability close to 0 (CNV neutral) should be small: , for some values of . Here, indicates the size of variation we expect neutral genes to have and is the probability of false positive we are willing to accept.
3.5. Markov Chain Monte Carlo Inference
3.6. Quality Control via Likelihood Evaluation
3.7. Code Availability
4. Results
4.1. Comparing BayesCNV to DeviCNV
4.2. Limit of Detection with Synthetic Data
4.3. Likelihood-Based Sample Filtering
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| CNV | Copy Number Variation |
| FFPE | Formalin-Fixed Paraffin-Embedded |
| HMCMC | Hamiltonian Monte Carlo |
| lCNR | Log Copy Number Ratio |
| LOD | Limit of Detection |
| MCMC | Markov chain Monte Carlo |
| NGS | Next-Generation Sequencing |
| NUTS | No U-Turn Sampler |
| QC | Quality Control |
| TI | Thermodynamic Integration |
| VAF | Variant Allele Frequency |
| WES | Whole-Exome Sequencing |
| WGS | Whole-Genome Sequencing |
References
- Pös, O.; Radvanszky, J.; Buglyó, G.; Pös, Z.; Rusnakova, D.; Nagy, B.; Szemes, T. DNA copy number variation: Main characteristics, evolutionary significance, and pathological aspects. Biomed. J. 2021, 44, 548–559. [Google Scholar] [CrossRef] [PubMed]
- Shlien, A.; Malkin, D. Copy number variations and cancer. Genome Med. 2009, 1, 62. [Google Scholar] [CrossRef] [PubMed]
- Vijay, A.; Garg, I.; Ashraf, M.Z. Perspective: DNA Copy Number Variations in Cardiovascular Diseases. Epigenet. Insights 2018, 11, 2516865718818839. [Google Scholar] [CrossRef] [PubMed]
- Fan, D.; Yang, X.; Huang, L.; Ouyang, G.; Yang, X.; Li, M. Simultaneous detection of target CNVs and SNVs of thalassemia by multiplex PCR and next-generation sequencing. Mol. Med. Rep. 2019, 19, 2837–2848. [Google Scholar] [CrossRef]
- Masood, D.; Ren, L.; Nguyen, C.; Brundu, F.G.; Zheng, L.; Zhao, Y.; Jaeger, E.; Li, Y.; Cha, S.W.; Halpern, A.; et al. Evaluation of somatic copy number variation detection by NGS technologies and bioinformatics tools on a hyper-diploid cancer genome. Genome Biol. 2024, 25, 163. [Google Scholar] [CrossRef]
- Yazaki, S.; Tokura, M.; Aiba, H.; Kojima, Y.; Shiraishi, K. Clinical applications of cell-free DNA-based liquid biopsy analysis. Transl. Oncol. 2025, 61, 102519. [Google Scholar] [CrossRef]
- Ma, L.; Guo, H.; Zhao, Y.; Liu, Z.; Wang, C.; Bu, J.; Sun, T.; Wei, J. Liquid biopsy in cancer: Current status, challenges and future prospects. Signal Transduct. Target. Ther. 2024, 9, 336. [Google Scholar] [CrossRef]
- Di Sario, G.; Rossella, V.; Famulari, E.S.; Maurizio, A.; Lazarevic, D.; Giannese, F.; Felici, C. Enhancing clinical potential of liquid biopsy through a multi-omic approach: A systematic review. Front. Genet. 2023, 14, 1152470. [Google Scholar] [CrossRef]
- Martignano, F.; Munagala, U.; Crucitta, S.; Mingrino, A.; Semeraro, R.; Del Re, M.; Petrini, I.; Magi, A.; Conticello, S.G. Nanopore sequencing from liquid biopsy: Analysis of copy number variations from cell-free DNA of lung cancer patients. Mol. Cancer 2021, 20, 32. [Google Scholar] [CrossRef]
- Hallermayr, A.; Wohlfrom, T.; Steinke-Lange, V.; Benet-Pagès, A.; Scharf, F.; Heitzer, E.; Mansmann, U.; Haberl, C.; De Wit, M.; Vogelsang, H.; et al. Somatic copy number alteration and fragmentation analysis in circulating tumor DNA for cancer screening and treatment monitoring in colorectal cancer patients. J. Hematol. Oncol. 2022, 15, 125. [Google Scholar] [CrossRef]
- Antonello, A.; Bergamin, R.; Calonaci, N.; Househam, J.; Milite, S.; Williams, M.J.; Anselmi, F.; d’Onofrio, A.; Sundaram, V.; Sosinsky, A.; et al. Computational validation of clonal and subclonal copy number alterations from bulk tumor sequencing using CNAqc. Genome Biol. 2024, 25, 38. [Google Scholar] [CrossRef]
- Singh, A.K.; Olsen, M.F.; Lavik, L.A.S.; Vold, T.; Drabløs, F.; Sjursen, W. Detecting copy number variation in next generation sequencing data from diagnostic gene panels. BMC Med. Genom. 2021, 14, 214. [Google Scholar] [CrossRef] [PubMed]
- Moreno-Cabrera, J.M.; Del Valle, J.; Castellanos, E.; Feliubadaló, L.; Pineda, M.; Brunet, J.; Serra, E.; Capellà, G.; Lázaro, C.; Gel, B. Evaluation of CNV detection tools for NGS panel data in genetic diagnostics. Eur. J. Hum. Genet. 2020, 28, 1645–1655. [Google Scholar] [CrossRef] [PubMed]
- Gelman, A.; Meng, X.-L. Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Stat. Sci. 1998, 13, 163–185. [Google Scholar] [CrossRef]
- Budczies, J.; Pfarr, N.; Stenzinger, A.; Treue, D.; Endris, V.; Ismaeel, F.; Bangemann, N.; Blohmer, J.-U.; Dietel, M.; Loibl, S.; et al. Ioncopy: A novel method for calling copy number alterations in amplicon sequencing data including significance assessment. Oncotarget 2016, 7, 13236–13247. [Google Scholar] [CrossRef]
- Kang, Y.; Nam, S.-H.; Park, K.S.; Kim, Y.; Kim, J.-W.; Lee, E.; Ko, J.M.; Lee, K.-A.; Park, I. DeviCNV: Detection and visualization of exon-level copy number variants in targeted next-generation sequencing data. BMC Bioinform. 2018, 19, 381. [Google Scholar] [CrossRef]
- Klambauer, G.; Schwarzbauer, K.; Mayr, A.; Clevert, D.-A.; Mitterecker, A.; Bodenhofer, U.; Hochreiter, S. cn.MOPS: Mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res. 2012, 40, e69. [Google Scholar] [CrossRef]
- Talevich, E.; Shain, A.H.; Botton, T.; Bastian, B.C. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput. Biol. 2016, 12, e1004873. [Google Scholar] [CrossRef]
- D’Aurizio, R.; Pippucci, T.; Tattini, L.; Giusti, B.; Pellegrini, M.; Magi, A. Enhanced copy number variants detection from whole-exome sequencing data using EXCAVATOR2. Nucleic Acids Res. 2016, 44, e154. [Google Scholar] [CrossRef]
- Backenroth, D.; Homsy, J.; Murillo, L.R.; Glessner, J.; Lin, E.; Brueckner, M.; Lifton, R.; Goldmuntz, E.; Chung, W.K.; Shen, Y. CANOES: Detecting rare copy number variants from whole exome sequencing data. Nucleic Acids Res. 2014, 42, e97. [Google Scholar] [CrossRef]
- Krumm, N.; Sudmant, P.H.; Ko, A.; O’Roak, B.J.; Malig, M.; Coe, B.P.; NHLBI Exome Sequencing Project; Quinlan, A.R.; Nickerson, D.A.; Eichler, E.E. Copy number variation detection and genotyping from exome sequence data. Genome Res. 2012, 22, 1525–1532. [Google Scholar] [CrossRef] [PubMed]
- Fromer, M.; Moran, J.L.; Chambert, K.; Banks, E.; Bergen, S.E.; Ruderfer, D.M.; Handsaker, R.E.; McCarroll, S.A.; O’Donovan, M.C.; Owen, M.J.; et al. Discovery and Statistical Genotyping of Copy-Number Variation from Whole-Exome Sequencing Depth. Am. J. Hum. Genet. 2012, 91, 597–607. [Google Scholar] [CrossRef] [PubMed]
- Talbot, A.; Kotlar, A.; Rishishiwar, L.; Ke, Y. Classifying Copy Number Variations Using State Space Modeling of Targeted Sequencing Data: A Case Study in Thalassemia. In Proceedings of the Machine Learning for Healthcare 2025, Rochester, MN, USA, 15 August 2025. [Google Scholar]
- Packer, J.S.; Maxwell, E.K.; O’Dushlaine, C.; Lopez, A.E.; Dewey, F.E.; Chernomorsky, R.; Baras, A.; Overton, J.D.; Habegger, L.; Reid, J.G. CLAMMS: A scalable algorithm for calling common and rare copy number variants from exome sequencing data. Bioinformatics 2016, 32, 133–135. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Y.; Oldridge, D.A.; Diskin, S.J.; Zhang, N.R. CODEX: A normalization and copy number variation detection method for whole exome sequencing. Nucleic Acids Res. 2015, 43, e39. [Google Scholar] [CrossRef]
- Plagnol, V.; Curtis, J.; Epstein, M.; Mok, K.Y.; Stebbings, E.; Grigoriadou, S.; Wood, N.W.; Hambleton, S.; Burns, S.O.; Thrasher, A.J.; et al. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics 2012, 28, 2747–2754. [Google Scholar] [CrossRef]
- Guo, Y.; Zhao, S.; Lehmann, B.D.; Sheng, Q.; Shaver, T.M.; Stricker, T.P.; Pietenpol, J.A.; Shyr, Y. Detection of internal exon deletion with exon Del. BMC Bioinform. 2014, 15, 332. [Google Scholar] [CrossRef]
- Shi, Y.; Majewski, J. FishingCNV: A graphical software package for detecting rare copy number variations in exome-sequencing data. Bioinformatics 2013, 29, 1461–1462. [Google Scholar] [CrossRef]
- Gambin, T.; Akdemir, Z.C.; Yuan, B.; Gu, S.; Chiang, T.; Carvalho, C.M.B.; Shaw, C.; Jhangiani, S.; Boone, P.M.; Eldomery, M.K.; et al. Homozygous and hemizygous CNV detection from exome sequencing data in a Mendelian disease cohort. Nucleic Acids Res. 2017, 45, 1633–1648. [Google Scholar] [CrossRef]
- Li, H.; Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 2010, 26, 589–595. [Google Scholar] [CrossRef]
- Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R. 1000 Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef]
- Gelman, A.; Carlin, J.; Stern, H.; Rubin, D. Bayesian Data Analysis; Chapman and Hall/CRC: Boca Raton, FL, USA, 1995. [Google Scholar]
- Robert, C.; Casella, G. Monte Carlo Statistical Methods; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
- Betancourt, M. A Conceptual Introduction to Hamiltonian Monte Carlo. arXiv 2018, arXiv:1701.02434. [Google Scholar] [CrossRef]
- Hoffman, M.D.; Gelman, A. The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 2014, 15, 1593–1623. [Google Scholar]
- Raftery, A.E.; Newton, M.A.; Satagopan, J.M.; Krivitsky, P.N. Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity. Bayesian Stat. 2007, 8, 371–416. [Google Scholar] [CrossRef]
- Neal, R.M. Annealed Importance Sampling. Stat. Comput. 2001, 11, 125–139. [Google Scholar] [CrossRef]
- Lai, G.; Xie, B.; Zhang, C.; Zhong, X.; Deng, J.; Li, K.; Liu, H.; Zhang, Y.; Liu, A.; Liu, Y.; et al. Comprehensive analysis of immune subtype characterization on identification of potential cells and drugs to predict response to immune checkpoint inhibitors for hepatocellular carcinoma. Genes Dis. 2025, 12, 101471. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, C.; He, J.; Lai, G.; Li, W.; Zeng, H.; Zhong, X.; Xie, B. Comprehensive analysis of single cell and bulk RNA sequencing reveals the heterogeneity of melanoma tumor microenvironment and predicts the response of immunotherapy. Inflamm. Res. 2024, 73, 1393–1409. [Google Scholar] [CrossRef]
- Chu, T.; Wang, Z.; Pe’er, D.; Danko, C.G. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat. Cancer 2022, 3, 505–517. [Google Scholar] [CrossRef]
- Jia, C. Kinetic foundation of the zero-inflated negative binomial model for single-cell RNA sequencing data. arXiv 2019, arXiv:1911.00356. [Google Scholar] [CrossRef]
- Chen, X.; Fang, L.T.; Chen, Z.; Chen, W.; Wu, H.; Zhu, B.; Moos, M.; Farmer, A.; Zhang, X.; Xiong, W.; et al. A benchmarking study of copy number variation inference methods using single-cell RNA-sequencing data. Precis. Clin. Med. 2025, 8, pbaf011. [Google Scholar] [CrossRef]
- Choi, H.Y.; Jo, H.; Zhao, X.; Hoadley, K.A.; Newman, S.; Holt, J.; Hayward, M.C.; Love, M.I.; Marron, J.S.; Hayes, D.N. SCISSOR: A framework for identifying structural changes in RNA transcripts. Nat. Commun. 2021, 12, 286. [Google Scholar] [CrossRef]




| Method | TP | TN | FP | FN | Sens | Spec |
|---|---|---|---|---|---|---|
| IonCopy | 14 | 686 | 159 | 1 | 0.93 | 0.81 |
| DeviCNV | 0 | 813 | 32 | 15 | 0 | 0.96 |
| BayesCNV | 13 | 842 | 3 | 2 | 0.87 | 0.996 |
| BayesCNV + QC | 13 | 825 | 0 | 2 | 0.87 | 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Talbot, A.; Kotlar, A.; Rishishwar, L.; Conley, A.; Zhao, M.; Yang, N.; Liu, M.; Wang, Z.; Polvino, S.; Ke, Y. BayesCNV: A Bayesian Hierarchical Model for Sensitive and Specific Copy Number Estimation in Cell Free DNA. Diagnostics 2026, 16, 280. https://doi.org/10.3390/diagnostics16020280
Talbot A, Kotlar A, Rishishwar L, Conley A, Zhao M, Yang N, Liu M, Wang Z, Polvino S, Ke Y. BayesCNV: A Bayesian Hierarchical Model for Sensitive and Specific Copy Number Estimation in Cell Free DNA. Diagnostics. 2026; 16(2):280. https://doi.org/10.3390/diagnostics16020280
Chicago/Turabian StyleTalbot, Austin, Alex Kotlar, Lavanya Rishishwar, Andrew Conley, Mengyao Zhao, Nachen Yang, Michael Liu, Zhaohui Wang, Sean Polvino, and Yue Ke. 2026. "BayesCNV: A Bayesian Hierarchical Model for Sensitive and Specific Copy Number Estimation in Cell Free DNA" Diagnostics 16, no. 2: 280. https://doi.org/10.3390/diagnostics16020280
APA StyleTalbot, A., Kotlar, A., Rishishwar, L., Conley, A., Zhao, M., Yang, N., Liu, M., Wang, Z., Polvino, S., & Ke, Y. (2026). BayesCNV: A Bayesian Hierarchical Model for Sensitive and Specific Copy Number Estimation in Cell Free DNA. Diagnostics, 16(2), 280. https://doi.org/10.3390/diagnostics16020280
