Next Article in Journal / Special Issue
On the Use of Voice Signals for Studying Sclerosis Disease
Previous Article in Journal
3D NAND Flash Based on Planar Cells
Article Menu

Export Article

Open AccessArticle
Computers 2017, 6(4), 29; doi:10.3390/computers6040029

Application of Machine Learning Models in Error and Variant Detection in High-Variation Genomics Datasets

1
Faculty of Mathematics and Informatics, Sofia University, 5 James Bourchier Blvd., 1164 Sofia, Bulgaria
2
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Acad. Georgi Bonchev Str., Bl. 8, 1113 Sofia, Bulgaria
This paper is an extended version of the paper “Machine learning models in error and variant detection in high-variation high-throughput sequencing datasets” presented at the International Conference on Computational Science ICCS 2017 (Zürich, Switzerland, 12–14 June 2017) and published in Procedia Computer Science, Vol. 108 (2017).
*
Author to whom correspondence should be addressed.
Received: 7 October 2017 / Revised: 5 November 2017 / Accepted: 7 November 2017 / Published: 10 November 2017
View Full-Text   |   Download PDF [485 KB, uploaded 10 November 2017]   |  

Abstract

For metagenomics datasets, datasets of complex polyploid genomes, and other high-variation genomics datasets, there are difficulties with the analysis, error detection and variant calling, stemming from the challenges of discerning sequencing errors from biological variation. Confirming base candidates with high frequency of occurrence is no longer a reliable measure because of the natural variation and the presence of rare bases. The paper discusses an approach to the application of machine learning models to classify bases into erroneous and rare variations after preselecting potential error candidates with a weighted frequency measure, which aims to focus on unexpected variations by using the inter-sequence pairwise similarity. Different similarity measures are used to account for different types of datasets. Four machine learning models are implemented and tested. View Full-Text
Keywords: machine learning; error discovery; variant calling; metagenomics; polyploid genomes machine learning; error discovery; variant calling; metagenomics; polyploid genomes
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Krachunov, M.; Nisheva, M.; Vassilev, D. Application of Machine Learning Models in Error and Variant Detection in High-Variation Genomics Datasets. Computers 2017, 6, 29.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Computers EISSN 2073-431X Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top