PSMC-FAC: Automated Optimization of False-Negative Rate Corrections for Low-Coverage PSMC-Based Demographic Inference
Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Dataset Preparation
2.2. A Mathematical Approach to Compute FNR for Low-Coverage Samples
2.3. Hausdorff Distance
2.3.1. Discrete Fréchet Distance
2.3.2. A Combination of Both Methods
2.4. FNR and Heterozygosity as a Function of Coverage
2.5. Sum-of-Least-Squares Assessment of Goodness-of-Fit
3. Results
3.1. PSMC-FAC Enables Accurate FNR-Based Correction Across Species and Coverages
3.2. Appropriate FNR-Based Correction Depends on Recent Demographic History
3.3. FNR Corrections Are Robust Across Diverse Demographic Histories
4. Discussion
4.1. FNR Correction in Low- and Mid-Depth Genomes: Reference Genome Effect
4.2. Effects of Biases Introduced by PSMC Assumptions on Optimal FNR Calculation
4.3. Polynomial Relationship Between Coverage and Optimal FNR
4.4. Empirical Applications and Future Implications
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| PSMC | Pairwise Sequentially Markovian Coalescent |
| FNR | False-Negative Rate |
| PSMC-FAC | PSMC False-Negative Rate Automatized Correction |
| WGS | Whole-Genome Sequencing |
| Ne | Effective Population Size |
| TMRCA | Time to the Most Recent Common Ancestor |
| ARG | Ancestral Recombination Graph |
| HMM | Hidden Markov Model |
| SMC | Sequentially Markov Coalescent |
| MSMC | Multiple Sequentially Markovian Coalescent |
| RAD | Restriction Site Associated DNA |
| BAM | Binary Alignment/Map format |
| CRAM | Compressed Reference-oriented Alignment Map format |
| VCF | Variant Call Format |
| PSMCFA | PSMC FASTA-like input format |
| PCR | Polymerase Chain Reaction |
| SSE | Sum of Squared Errors |
| CHB | Han Chinese in Beijing, China (1000 Genomes Project population) |
| YRI | Yoruba in Ibadan, Nigeria (1000 Genomes Project population) |
| TSI | Toscani in Italia (1000 Genomes Project population) |
| 1000GP | 1000 Genomes Project |
| ARS-UCD1.2 | Agricultural Research Service, United States Department of Agriculture |
| GRCh38 | Genome Reference Consortium Human Build 38 |
| Hg38 | Human Genome version 38 |
| CanFam3.1 | Dog Reference Genome Assembly Version 3.1 |
| BosTau9 | Cattle Reference Genome Assembly Version 9 |
| kya | Thousand Years Ago |
| Mya | Million Years Ago |
| DNA | Deoxyribonucleic Acid |
| Coefficient of Determination |
Appendix A. Computational Workflow for PSMC Processing and FNR Estimation Using PSMC-FAC
Appendix A.1. Preparation of PSMC Input Files
- where the time interval configuration (time_vector) is species-specific, as described in the Dataset Preparation Subsection of the Materials and Methods. Although many studies use the default time_vector described in [5], this was designed originally for human populations.
Appendix A.2. Usage of PSMC-FAC
Appendix A.2.1. Summary
Appendix A.2.2. User Manual
Appendix A.3. Plotting Other Low-Coverage Genomes According to PSMC-FAC-Assisted FNR Correction
Appendix B. Appendix Figures





| Common Name | Scientific Name | Population | Sample Number | Coverage | Heterozygosity | Source |
|---|---|---|---|---|---|---|
| Cow | Bos taurus | Angus breed | 19879801 | 19.18X | (A) | |
| Brangus breed | 19999911 | 39.47X | (A) | |||
| Beefmaster breed | 19999927 | 31.11X | (A) | |||
| Grey wolf | Canis lupus | C. l. italicus (Italian wolf) | SAMEA116045429 | 23.67X | (B) | |
| SAMEA116045431 | 27.19X | (B) | ||||
| SAMEA116045435 | 26.08X | (B) | ||||
| C. l. signatus (Iberian wolf) | SAMN43221691 | 20.42X | (C) | |||
| SAMN43221682 | 19.03X | (C) | ||||
| SAMN04851099 | 18.08X | (C) | ||||
| Human | H. sapiens | Han Chinese (CHB) | NA18543 | 29.59X | (D) | |
| NA18544 | 29.53X | (D) | ||||
| NA18559 | 33.34X | (D) | ||||
| Yoruba, Nigeria (YRI) | NA18867 | 30.18X | (D) | |||
| NA18924 | 31.27X | (D) | ||||
| NA19096 | 31.35X | (D) | ||||
| Toscani, Italy (TSI) | NA20754 | 31.63X | (D) | |||
| NA20759 | 32.65X | (D) | ||||
| NA20766 | 29.95X | (D) |
References
- Aimé, C.; Verdu, P.; Ségurel, L.; Martinez-Cruz, B.; Hegay, T.; Heyer, E.; Austerlitz, F. Microsatellite data show recent demographic expansions in sedentary but not in nomadic human populations in Africa and Eurasia. Eur. J. Hum. Genet. 2014, 22, 1201–1207. [Google Scholar] [CrossRef]
- Miller, E.F.; Manica, A.; Amos, W. Global demographic history of human populations inferred from whole mitochondrial genomes. R. Soc. Open Sci. 2018, 5, 180543. [Google Scholar] [CrossRef]
- Eddine, A.; Gomes Rocha, R.; Mostefai, N.; Karssene, Y.; De Smet, K.; Brito, J.C.; Klees, D.; Nowak, C.; Cocchiararo, B.; Lopes, S.; et al. Demographic expansion of an African opportunistic carnivore during the Neolithic revolution. Biol. Lett. 2020, 16, 20190560. [Google Scholar] [CrossRef] [PubMed]
- Csapó, H.; Jabłońska, A.; Węsławski, J.M.; Mieszkowska, N.; Gantsevich, M.; Dahl-Hansen, I.; Renaud, P.; Grabowski, M. mtDNA data reveal disparate population structures and High Arctic colonization patterns in three intertidal invertebrates with contrasting life history traits. Front. Mar. Sci. 2023, 10, 1275320. [Google Scholar] [CrossRef]
- Li, H.; Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 2011, 475, 493–496. [Google Scholar] [CrossRef]
- MacLeod, I.M.; Larkin, D.M.; Lewin, H.A.; Hayes, B.J.; Goddard, M.E. Inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors. Mol. Biol. Evol. 2013, 30, 2209–2223. [Google Scholar] [CrossRef]
- Kim, H.; Ratan, A.; Perry, G.H.; Montenegro, A.; Miller, W.; Schuster, S.C. Khoisan hunter-gatherers have been the largest population throughout most of modern-human demographic history. Nat. Commun. 2014, 5, 6692. [Google Scholar] [CrossRef]
- Hawkins, M.T.R.; Culligan, R.R.; Frasier, C.L.; Dikow, R.B.; Hagenson, R.; Lei, R.; Louis, E.E. Genome sequence and population declines in the critically endangered greater bamboo lemur (Prolemur simus) and implications for conservation. BMC Genom. 2018, 19, 445. [Google Scholar] [CrossRef]
- Nadachowska-Brzyska, K.; Burri, R.; Smeds, L.; Ellegren, H. PSMC analysis of effective population sizes in molecular ecology and its application to black-and-white Ficedula flycatchers. Mol. Ecol. 2016, 25, 1058–1072. [Google Scholar] [CrossRef] [PubMed]
- Kingman, J.F.C. On the genealogy of large populations. J. Appl. Probab. 1982, 19, 27–43. [Google Scholar] [CrossRef]
- Wakeley, J. Developments in coalescent theory from single loci to chromosomes. Theor. Popul. Biol. 2020, 133, 56–64. [Google Scholar] [CrossRef]
- McVean, G.A.T.; Cardin, N.J. Approximating the coalescent with recombination. Philos. Trans. R. Soc. B 2005, 360, 1387. [Google Scholar] [CrossRef]
- Wiuf, C.; Hein, J. Recombination as a Point Process along Sequences. Theor. Popul. Biol. 1999, 55, 248–259. [Google Scholar] [CrossRef] [PubMed]
- Mather, N.; Traves, S.M.; Ho, S.Y.W. A practical introduction to sequentially Markovian coalescent methods for estimating demographic history from genomic data. Ecol. Evol. 2020, 10, 579–589. [Google Scholar] [CrossRef]
- Peede, D.; Bañuelos, M.M.; Medina Tretmanis, J.; Miyagi, M.; Huerta-Sánchez, E. Recent advances in methods to characterize archaic introgression in modern humans. Genome Res. 2026, 36, 239–256. [Google Scholar] [CrossRef] [PubMed]
- Sellinger, T.P.P.; Abu-Awad, D.; Tellier, A. Limits and convergence properties of the sequentially Markovian coalescent. Mol. Ecol. Resour. 2021, 21, 2231–2248. [Google Scholar] [CrossRef]
- Cousins, T.; Tabin, D.; Patterson, N.; Reich, D.; Durvasula, A. Accurate inference of population history in the presence of background selection. bioRxiv 2024. [Google Scholar] [CrossRef]
- Mazet, O.; Rodríguez, W.; Grusea, S.; Boitard, S.; Chikhi, L. On the importance of being structured: Instantaneous coalescence rates and human evolution—lessons for ancestral population size inference? Heredity 2016, 116, 362–371. [Google Scholar] [CrossRef] [PubMed]
- Chikhi, L.; Rodríguez, W.; Grusea, S.; Santos, P.; Boitard, S.; Mazet, O. The IICR (inverse instantaneous coalescence rate) as a summary of genomic diversity. Heredity 2018, 120, 13–24. [Google Scholar] [CrossRef]
- Nieto, A.; Lao, O.; Mona, S. Performance of Sequential Markovian Coalescence Methods when Populations are Structured. bioRxiv 2025. [Google Scholar] [CrossRef]
- Hilgers, L.; Liu, S.; Jensen, A.; Brown, T.; Cousins, T.; Schweiger, R.; Guschanski, K.; Hiller, M. Avoidable false PSMC population size peaks occur across numerous studies. Curr. Biol. 2025, 35, 927–930.e3. [Google Scholar] [CrossRef] [PubMed]
- Schiffels, S.; Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 2014, 46, 919–925. [Google Scholar] [CrossRef]
- Terhorst, J.; Kamm, J.A.; Song, Y.S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 2016, 49, 303–309. [Google Scholar] [CrossRef]
- Cousins, T.; Scally, A.; Durbin, R. A structured coalescent model reveals deep ancestral structure shared by all modern humans. Nat. Genet. 2025, 57, 856–864. [Google Scholar] [CrossRef] [PubMed]
- Hey, J.; Machado, C.A. The study of structured populations—new hope for a difficult and divided science. Nat. Rev. Genet. 2003, 4, 535–543. [Google Scholar] [CrossRef] [PubMed]
- Pritchard, J.K.; Stephens, M.; Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 2000, 155, 945–959. [Google Scholar] [CrossRef]
- Sarabia, C.; von Holdt, B.; Larrasoaña, J.C.; Uríos, V.; Leonard, J.A. Pleistocene climate fluctuations drove demographic history of African golden wolves (Canis lupaster). Mol. Ecol. 2021, 30, 6101–6120. [Google Scholar] [CrossRef]
- Cock, P.J.; Fields, C.J.; Goto, N.; Heuer, M.L.; Rice, P.M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010, 38, 1767–1771. [Google Scholar] [CrossRef]
- Lindblad-Toh, K.; Wade, C.M.; Mikkelsen, T.S.; Karlsson, E.K.; Jaffe, D.B.; Kamal, M.; Clamp, M.; Chang, J.L.; Kulbokas, E.J., III; Zody, M.C.; et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 2005, 438, 803–819. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 2010, 26, 589–595. [Google Scholar] [CrossRef] [PubMed]
- Bonfield, J.K. CRAM 3.1: Advances in the CRAM file format. Bioinformatics 2022, 38, 1497–1503. [Google Scholar] [CrossRef]
- 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 2015, 526, 68–74. [Google Scholar] [CrossRef]
- Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef]
- USDA Agricultural Research Service (ARS). Bovine Reference Genome and Whole-Genome Sequencing Data. U.S. Department of Agriculture. 2025. Available online: https://www.ars.usda.gov/plains-area/clay-center-ne/marc/wgs/bovref/ (accessed on 7 December 2025).
- Heaton, M.P.; Smith, T.P.L.; Carnahan, J.K.; Basnayake, V.; Qiu, J.; Simpson, B.; Kalbfleisch, T.S. Using diverse U.S. beef cattle genomes to identify missense mutations in EPAS1, a gene associated with high-altitude pulmonary hypertension. F1000Research 2016, 5, 2003. [Google Scholar] [CrossRef]
- Danecek, P.; Bonfield, J.K.; Liddle, J.; Marshall, J.; Ohan, V.; Pollard, M.O.; Whitwham, A.; Keane, T.; McCarthy, S.A.; Davies, R.M.; et al. Twelve years of SAMtools and BCFtools. GigaScience 2021, 10, giab008. [Google Scholar] [CrossRef] [PubMed]
- Schneider, V.A.; Graves-Lindsay, T.; Howe, K.; Bouk, N.; Chen, H.C.; Kitts, P.A.; Murphy, T.D.; Pruitt, K.D.; Thibaud-Nissen, F.; Albracht, D.; et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 2017, 27, 849–864. [Google Scholar] [CrossRef] [PubMed]
- Rosen, B.D.; Bickhart, D.M.; Schnabel, R.D.; Koren, S.; Elsik, C.G.; Tseng, E.; Rowan, T.N.; Low, W.Y.; Zimin, A.; Couldrey, C.; et al. De novo assembly of the cattle reference genome with single-molecule sequencing. GigaScience 2020, 9, giaa021. [Google Scholar] [CrossRef]
- Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The variant call format and VCFtools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef]
- Freedman, A.H.; Gronau, I.; Schweizer, R.M.; Ortega-Del Vecchyo, D.; Han, E.; Silva, P.M.; Galaverni, M.; Fan, Z.; Marx, P.; Lorente-Galdos, B.; et al. Genome sequencing highlights the dynamic early history of dogs. PLoS Genet. 2014, 10, e1004016. [Google Scholar] [CrossRef] [PubMed]
- Mei, C.; Wang, H.; Liao, Q.; Wang, L.; Cheng, G.; Wang, H.; Zhao, C.; Zhao, S.; Song, J.; Guang, X.; et al. Genetic architecture and selection of Chinese cattle revealed by whole genome resequencing. Mol. Biol. Evol. 2018, 35, 688–699. [Google Scholar] [CrossRef]
- Liu, X.; Li, Z.; Yan, Y.; Li, Y.; Wu, H.; Pei, J.; Yan, P.; Yang, R.; Guo, X.; Lan, X. Selection and introgression facilitated the adaptation of Chinese native endangered cattle in extreme environments. Evol. Appl. 2020, 14, 860–873. [Google Scholar] [CrossRef]
- Alt, H.; Behrends, B.; Blömer, J. Approximate matching of polygonal shapes. Ann. Math. Artif. Intell. 1995, 13, 251–265. [Google Scholar] [CrossRef]
- Ahn, H.K.; Knauer, C.; Scherfenberg, M.; Schlipf, L.; Vigneron, A. Computing the discrete Fréchet distance with imprecise input. Lect. Notes Comput. Sci. 2010, 6507, 422–433. [Google Scholar]
- Fuentes-Pardo, A.P.; Ruzzante, D.E. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations. Mol. Ecol. 2017, 26, 5369–5406. [Google Scholar] [CrossRef] [PubMed]
- Buerkle, C.A.; Gompert, Z. Population genomics based on low coverage sequencing: How low should we go? Mol. Ecol. 2013, 22, 3028–3035. [Google Scholar] [CrossRef]
- Hermosilla-Albala, N.; Silva, F.E.; Cuadros-Espinoza, S.; Fontsere, C.; Valenzuela-Seba, A.; Pawar, H.; Gut, M.; Kelley, J.L.; Ruibal-Puertas, S.; Alentorn-Moron, P.; et al. Whole genomes of Amazonian uakari monkeys reveal complex connectivity and fast differentiation driven by high environmental dynamism. Commun. Biol. 2024, 7, 1283. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Hansen, M.M. PSMC analysis of RAD sequencing data. Mol. Ecol. Resour. 2017, 17, 631–641. [Google Scholar] [CrossRef]
- Pan, B.; Kusko, R.; Xiao, W.; Zheng, Y.; Liu, Z.; Xiao, C.; Sakkiah, S.; Guo, W.; Gong, P.; Zhang, C.; et al. Similarities and differences between variants called with human reference genome HG19 or HG38. BMC Bioinform. 2019, 20, 101. [Google Scholar] [CrossRef]
- Günther, T.; Nettelblad, C. The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet. 2019, 15, e1008302. [Google Scholar] [CrossRef] [PubMed]
- Bergström, A.; Stanton, D.W.G.; Taron, U.H.; Frantz, L.; Sinding, M.H.S.; Ersmark, E.; Pfrengle, S.; Cassatt-Johnstone, M.; Lebrasseur, O.; Girdland-Flink, L.; et al. Grey wolf genomic history reveals a dual ancestry of dogs. Nature 2022, 607, 313–320. [Google Scholar] [CrossRef] [PubMed]
- Battilani, D.; Gargiulo, R.; Caniglia, R.; Fabbri, E.; Madrigal, J.R.; Fontsere, C.; Ciucani, M.M.; Gopalakrishnan, S.; Girardi, M.; Fracasso, I.; et al. Beyond population size: Whole-genome data reveal bottleneck legacies in the peninsular Italian wolf. J. Hered. 2025, 116, 10–23. [Google Scholar] [CrossRef] [PubMed]
- Tournebize, R.; Chikhi, L. Ignoring population structure in hominin evolutionary models can lead to the inference of spurious admixture events. Nat. Ecol. Evol. 2025, 9, 225–236. [Google Scholar] [CrossRef]
- Cahill, J.A.; Soares, A.E.; Green, R.E.; Shapiro, B. Inferring species divergence times using pairwise sequentially Markovian coalescent modelling and low-coverage genomic data. Philos. Trans. R. Soc. B 2016, 371, 20150138. [Google Scholar] [CrossRef] [PubMed]
- Patton, A.H.; Margres, M.J.; Stahlke, A.R.; Hendricks, S.; Lewallen, K.; Hamede, R.K.; Ruiz-Aravena, M.; Ryder, O.; McCallum, H.I.; Jones, M.E.; et al. Contemporary demographic reconstruction methods are robust to genome assembly quality: A case study in Tasmanian devils. Mol. Biol. Evol. 2019, 36, 2906–2921. [Google Scholar] [CrossRef] [PubMed]
- Peede, D.; Cousins, T.; Durvasula, A.; Ignatieva, A.; Kovacs, T.G.; Nieto, A.; Puckett, E.E.; Chevy, E.T. Not just Ne no more: New applications for SMC from ecology to phylogenies. Genome Biol. Evol. 2026, 18, evaf229. [Google Scholar] [CrossRef] [PubMed]
- Daetwyler, H.D.; Capitan, A.; Pausch, H.; Stothard, P.; Van Binsbergen, R.; Brøndum, R.F.; Liao, X.; Djari, A.; Rodriguez, S.C.; Grohs, C.; et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat. Genet. 2014, 46, 858–865. [Google Scholar] [CrossRef]
- Sarabia, C.; Salado, I.; Fernández-Gil, A.; Vonholdt, B.M.; Hofreiter, M.; Vila, C.; Leonard, J.A. Potential adaptive introgression from dogs in Iberian grey wolves (Canis lupus). Mol. Ecol. 2025, 34, e17639. [Google Scholar] [CrossRef]



Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Iglesias-Santos, F.; Nieto, A.; Casillas, S.; Barbadilla, A.; Sarabia, C. PSMC-FAC: Automated Optimization of False-Negative Rate Corrections for Low-Coverage PSMC-Based Demographic Inference. Biology 2026, 15, 631. https://doi.org/10.3390/biology15080631
Iglesias-Santos F, Nieto A, Casillas S, Barbadilla A, Sarabia C. PSMC-FAC: Automated Optimization of False-Negative Rate Corrections for Low-Coverage PSMC-Based Demographic Inference. Biology. 2026; 15(8):631. https://doi.org/10.3390/biology15080631
Chicago/Turabian StyleIglesias-Santos, Francisco, Alba Nieto, Sònia Casillas, Antonio Barbadilla, and Carlos Sarabia. 2026. "PSMC-FAC: Automated Optimization of False-Negative Rate Corrections for Low-Coverage PSMC-Based Demographic Inference" Biology 15, no. 8: 631. https://doi.org/10.3390/biology15080631
APA StyleIglesias-Santos, F., Nieto, A., Casillas, S., Barbadilla, A., & Sarabia, C. (2026). PSMC-FAC: Automated Optimization of False-Negative Rate Corrections for Low-Coverage PSMC-Based Demographic Inference. Biology, 15(8), 631. https://doi.org/10.3390/biology15080631

