Next Article in Journal
Comparative Genomic Analysis of Rice with Contrasting Photosynthesis and Grain Production under Salt Stress
Previous Article in Journal
The Gene Master Regulators (GMR) Approach Provides Legitimate Targets for Personalized, Time-Sensitive Cancer Gene Therapy
Article Menu

Export Article

Open AccessArticle

SiNPle: Fast and Sensitive Variant Calling for Deep Sequencing Data

Integrative Biology and Bioinformatics, The Pirbright Institute, Woking GU24 0NF, UK
*
Author to whom correspondence should be addressed.
Current Address: Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford OX1 2JD, UK.
These authors contributed equally to this work.
Genes 2019, 10(8), 561; https://doi.org/10.3390/genes10080561
Received: 13 April 2019 / Revised: 8 July 2019 / Accepted: 16 July 2019 / Published: 25 July 2019
(This article belongs to the Special Issue Pipeline Tools for Next Generation Sequencing Analysis)
  |  
PDF [579 KB, uploaded 31 July 2019]
  |  

Abstract

Current high-throughput sequencing technologies can generate sequence data and provide information on the genetic composition of samples at very high coverage. Deep sequencing approaches enable the detection of rare variants in heterogeneous samples, such as viral quasi-species, but also have the undesired effect of amplifying sequencing errors and artefacts. Distinguishing real variants from such noise is not straightforward. Variant callers that can handle pooled samples can be in trouble at extremely high read depths, while at lower depths sensitivity is often sacrificed to specificity. In this paper, we propose SiNPle (Simplified Inference of Novel Polymorphisms from Large coveragE), a fast and effective software for variant calling. SiNPle is based on a simplified Bayesian approach to compute the posterior probability that a variant is not generated by sequencing errors or PCR artefacts. The Bayesian model takes into consideration individual base qualities as well as their distribution, the baseline error rates during both the sequencing and the PCR stage, the prior distribution of variant frequencies and their strandedness. Our approach leads to an approximate but extremely fast computation of posterior probabilities even for very high coverage data, since the expression for the posterior distribution is a simple analytical formula in terms of summary statistics for the variants appearing at each site in the genome. These statistics can be used to filter out putative SNPs and indels according to the required level of sensitivity. We tested SiNPle on several simulated and real-life viral datasets to show that it is faster and more sensitive than existing methods. The source code for SiNPle is freely available to download and compile, or as a Conda/Bioconda package. View Full-Text
Keywords: next generation sequencing; low-frequency variants; heterogeneous populations; Bayesian modelling next generation sequencing; low-frequency variants; heterogeneous populations; Bayesian modelling
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Ferretti, L.; Tennakoon, C.; Silesian, A.; Freimanis, G.; Ribeca, P. SiNPle: Fast and Sensitive Variant Calling for Deep Sequencing Data. Genes 2019, 10, 561.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Genes EISSN 2073-4425 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top